datajure.io

Liking cljdoc? Tell your friends :D

Clojure only.

read
read-seq
write

Unified file I/O for datajure. Dispatches on file extension. Natively supports: CSV, TSV, JSON, JSON Lines (.jsonl/.ndjson), nippy (and .gz variants of all). Parquet, xlsx/xls, and Arrow require optional extra dependencies.

Unified file I/O for datajure. Dispatches on file extension.
Natively supports: CSV, TSV, JSON, JSON Lines (.jsonl/.ndjson), nippy
(and .gz variants of all). Parquet, xlsx/xls, and Arrow require optional
extra dependencies.

raw docstring

read^clj

(read path)

(read path options)

Read a dataset from a file. Dispatches on file extension.

Natively supported: .csv .tsv .json .jsonl/.ndjson .nippy (and .gz variants). Optional deps required: .parquet .xlsx .xls .arrow .feather.

Options are passed through to the underlying tech.v3.dataset reader. Columns are returned as keywords by default (:key-fn keyword).

Examples: (read "data.csv") (read "data.json") (read "data.jsonl") ;; JSON Lines — one object per line (read "data.parquet") (read "data.tsv.gz") (read "data.csv" {:separator \tab})

Read a dataset from a file. Dispatches on file extension.

Natively supported: .csv .tsv .json .jsonl/.ndjson .nippy (and .gz variants).
Optional deps required: .parquet .xlsx .xls .arrow .feather.

Options are passed through to the underlying tech.v3.dataset reader.
Columns are returned as keywords by default (:key-fn keyword).

Examples:
  (read "data.csv")
  (read "data.json")
  (read "data.jsonl")        ;; JSON Lines — one object per line
  (read "data.parquet")
  (read "data.tsv.gz")
  (read "data.csv" {:separator \tab})

source raw docstring

read-seq^clj

(read-seq path)

(read-seq path options)

Read a file as a lazy sequence of datasets.

Parquet streams in row-group chunks — genuinely incremental, suitable for files larger than memory.
JSON Lines (.jsonl / .ndjson, and .gz variants) streams in batches of :batch-size rows (default 100000) — also genuinely incremental. Fully consume the sequence (e.g. inside a doseq) so the underlying file handle is released.
JSON (.json) is a single array-of-objects document with no chunk boundaries, so it is read whole and yielded as a one-element lazy sequence. This gives no streaming/memory benefit — it exists only so a read-seq call site works uniformly across formats. Reach for Parquet or JSON Lines for true out-of-core reads.

Examples: (read-seq "huge.parquet") ;; many chunks, streamed (read-seq "huge.jsonl" {:batch-size 50000}) ;; streamed in 50k-row chunks (doseq [chunk (read-seq "data.json")] ;; exactly one chunk (process chunk))

Read a file as a lazy sequence of datasets.

- Parquet streams in row-group chunks — genuinely incremental, suitable for
  files larger than memory.
- JSON Lines (.jsonl / .ndjson, and .gz variants) streams in batches of
  :batch-size rows (default 100000) — also genuinely incremental. Fully
  consume the sequence (e.g. inside a doseq) so the underlying file handle
  is released.
- JSON (.json) is a single array-of-objects document with no chunk
  boundaries, so it is read whole and yielded as a one-element lazy sequence.
  This gives no streaming/memory benefit — it exists only so a `read-seq`
  call site works uniformly across formats. Reach for Parquet or JSON Lines
  for true out-of-core reads.

Examples:
  (read-seq "huge.parquet")                      ;; many chunks, streamed
  (read-seq "huge.jsonl" {:batch-size 50000})    ;; streamed in 50k-row chunks
  (doseq [chunk (read-seq "data.json")]           ;; exactly one chunk
    (process chunk))

source raw docstring

write^clj

(write dataset path)

(write dataset path options)

Write a dataset to a file. Dispatches on file extension.

Natively supported: .csv .tsv .json .jsonl/.ndjson .nippy (and .gz variants). Optional deps required: .parquet .xlsx.

Options are passed through to the underlying tech.v3.dataset writer (JSON Lines encodes each row independently and ignores writer options).

Examples: (write ds "output.csv") (write ds "output.json") (write ds "output.jsonl") ;; JSON Lines — one object per line (write ds "output.parquet") (write ds "output.tsv.gz") (write ds "output.csv" {:separator \tab})

Write a dataset to a file. Dispatches on file extension.

Natively supported: .csv .tsv .json .jsonl/.ndjson .nippy (and .gz variants).
Optional deps required: .parquet .xlsx.

Options are passed through to the underlying tech.v3.dataset writer
(JSON Lines encodes each row independently and ignores writer options).

Examples:
  (write ds "output.csv")
  (write ds "output.json")
  (write ds "output.jsonl")     ;; JSON Lines — one object per line
  (write ds "output.parquet")
  (write ds "output.tsv.gz")
  (write ds "output.csv" {:separator \tab})

source raw docstring

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts

`Ctrl`+`k`	Jump to recent docs
`←`	Move to previous article
`→`	Move to next article
`Ctrl`+`/`	Jump to the search field

Raise an issue Browse cljdoc source Chat on Slack

× close

datajure.io

readclj

read-seqclj

writeclj

read^clj

read-seq^clj

write^clj