Unified file I/O for datajure. Dispatches on file extension. Natively supports: CSV, TSV, JSON, JSON Lines (.jsonl/.ndjson), nippy (and .gz variants of all). Parquet, xlsx/xls, and Arrow require optional extra dependencies.
Unified file I/O for datajure. Dispatches on file extension. Natively supports: CSV, TSV, JSON, JSON Lines (.jsonl/.ndjson), nippy (and .gz variants of all). Parquet, xlsx/xls, and Arrow require optional extra dependencies.
(read path)(read path options)Read a dataset from a file. Dispatches on file extension.
Natively supported: .csv .tsv .json .jsonl/.ndjson .nippy (and .gz variants). Optional deps required: .parquet .xlsx .xls .arrow .feather.
Options are passed through to the underlying tech.v3.dataset reader. Columns are returned as keywords by default (:key-fn keyword).
Examples: (read "data.csv") (read "data.json") (read "data.jsonl") ;; JSON Lines — one object per line (read "data.parquet") (read "data.tsv.gz") (read "data.csv" {:separator \tab})
Read a dataset from a file. Dispatches on file extension.
Natively supported: .csv .tsv .json .jsonl/.ndjson .nippy (and .gz variants).
Optional deps required: .parquet .xlsx .xls .arrow .feather.
Options are passed through to the underlying tech.v3.dataset reader.
Columns are returned as keywords by default (:key-fn keyword).
Examples:
(read "data.csv")
(read "data.json")
(read "data.jsonl") ;; JSON Lines — one object per line
(read "data.parquet")
(read "data.tsv.gz")
(read "data.csv" {:separator \tab})(read-seq path)(read-seq path options)Read a file as a lazy sequence of datasets.
read-seq
call site works uniformly across formats. Reach for Parquet or JSON Lines
for true out-of-core reads.Examples: (read-seq "huge.parquet") ;; many chunks, streamed (read-seq "huge.jsonl" {:batch-size 50000}) ;; streamed in 50k-row chunks (doseq [chunk (read-seq "data.json")] ;; exactly one chunk (process chunk))
Read a file as a lazy sequence of datasets.
- Parquet streams in row-group chunks — genuinely incremental, suitable for
files larger than memory.
- JSON Lines (.jsonl / .ndjson, and .gz variants) streams in batches of
:batch-size rows (default 100000) — also genuinely incremental. Fully
consume the sequence (e.g. inside a doseq) so the underlying file handle
is released.
- JSON (.json) is a single array-of-objects document with no chunk
boundaries, so it is read whole and yielded as a one-element lazy sequence.
This gives no streaming/memory benefit — it exists only so a `read-seq`
call site works uniformly across formats. Reach for Parquet or JSON Lines
for true out-of-core reads.
Examples:
(read-seq "huge.parquet") ;; many chunks, streamed
(read-seq "huge.jsonl" {:batch-size 50000}) ;; streamed in 50k-row chunks
(doseq [chunk (read-seq "data.json")] ;; exactly one chunk
(process chunk))(write dataset path)(write dataset path options)Write a dataset to a file. Dispatches on file extension.
Natively supported: .csv .tsv .json .jsonl/.ndjson .nippy (and .gz variants). Optional deps required: .parquet .xlsx.
Options are passed through to the underlying tech.v3.dataset writer (JSON Lines encodes each row independently and ignores writer options).
Examples: (write ds "output.csv") (write ds "output.json") (write ds "output.jsonl") ;; JSON Lines — one object per line (write ds "output.parquet") (write ds "output.tsv.gz") (write ds "output.csv" {:separator \tab})
Write a dataset to a file. Dispatches on file extension.
Natively supported: .csv .tsv .json .jsonl/.ndjson .nippy (and .gz variants).
Optional deps required: .parquet .xlsx.
Options are passed through to the underlying tech.v3.dataset writer
(JSON Lines encodes each row independently and ignores writer options).
Examples:
(write ds "output.csv")
(write ds "output.json")
(write ds "output.jsonl") ;; JSON Lines — one object per line
(write ds "output.parquet")
(write ds "output.tsv.gz")
(write ds "output.csv" {:separator \tab})cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |