Liking cljdoc? Tell your friends :D

datahike.pg.sql.copy.csv-format

PostgreSQL COPY-IN CSV-format decoder. Quote-aware state machine matching PG's CopyReadAttributesCSV semantics from ../postgres/src/backend/commands/copyfromparse.c:1827.

CSV is a different beast from text format:

  • Backslash is a literal char (no escape sequences).
  • End-of-data marker \. is not recognised inside CSV streams (uses CopyDone / EOF instead).
  • NULL detection only fires on unquoted fields whose raw text matches the null marker. "" is empty-string, never null (unless FORCE_NULL is set for that column).
  • Embedded delimiters / line terminators are allowed inside quoted fields.
  • Embedded quote chars in quoted fields are escaped by either:
    • doubling them (default: ESCAPE = QUOTE = ")
    • prefixing with the configured escape char

The state machine per row:

  • Start in NOT_QUOTED. Walk bytes: delimiter → end of field line terminator → end of row quote char → enter QUOTED, set saw_quote=true else → append to field

  • In QUOTED, walk bytes: escape char (peek next): if next is escape or quote → consume, append literal else fall through (treat as literal) quote char → exit QUOTED (back to NOT_QUOTED) else → append to field

  • At end of row, for each field: if !saw_quote AND raw == null_marker → field is ::null else → field is the de-escaped string

  • FORCE_NOT_NULL columns: skip the null check (always treated as non-null even if raw matches null_marker).

  • FORCE_NULL columns: NULL check applies even if quoted (so a quoted "" matching null_marker becomes null instead of empty string).

Streaming API mirrors text-format:

(def d (make-decoder opts)) [d' rows eod?] = (decode-step d chunk) [rows eod?] = (decode-finalize d')

opts keys: :delimiter, :null-marker, :quote, :escape, :header (:true|:false|:match), :force-not-null, :force-null, :columns (used when HEADER MATCH is on).

Output is a vector of vectors-of-(String|::null).

PostgreSQL COPY-IN CSV-format decoder. Quote-aware state machine
matching PG's `CopyReadAttributesCSV` semantics from
`../postgres/src/backend/commands/copyfromparse.c:1827`.

CSV is a *different beast* from text format:

  - Backslash is a literal char (no escape sequences).
  - End-of-data marker `\.` is **not** recognised inside CSV
    streams (uses CopyDone / EOF instead).
  - NULL detection only fires on **unquoted** fields whose raw
    text matches the null marker. `""` is empty-string, never
    null (unless FORCE_NULL is set for that column).
  - Embedded delimiters / line terminators are allowed inside
    quoted fields.
  - Embedded quote chars in quoted fields are escaped by either:
      - doubling them (default: ESCAPE = QUOTE = `"`)
      - prefixing with the configured escape char

The state machine per row:

  - Start in NOT_QUOTED. Walk bytes:
      delimiter → end of field
      line terminator → end of row
      quote char → enter QUOTED, set saw_quote=true
      else → append to field
  - In QUOTED, walk bytes:
      escape char (peek next):
        if next is escape or quote → consume, append literal
        else fall through (treat as literal)
      quote char → exit QUOTED (back to NOT_QUOTED)
      else → append to field

  - At end of row, for each field:
      if !saw_quote AND raw == null_marker → field is ::null
      else → field is the de-escaped string

  - FORCE_NOT_NULL columns: skip the null check (always treated
    as non-null even if raw matches null_marker).
  - FORCE_NULL columns: NULL check applies even if quoted (so a
    quoted `""` matching null_marker becomes null instead of
    empty string).

Streaming API mirrors `text-format`:

  (def d (make-decoder opts))
  [d' rows eod?] = (decode-step d chunk)
  [rows eod?]   = (decode-finalize d')

`opts` keys: `:delimiter`, `:null-marker`, `:quote`, `:escape`,
`:header` (`:true|:false|:match`), `:force-not-null`, `:force-null`,
`:columns` (used when HEADER MATCH is on).

Output is a vector of vectors-of-(String|`::null`).
raw docstring

decode-allclj

(decode-all opts chunks)

Convenience: decode an entire seq of chunks and return all rows.

Convenience: decode an entire seq of chunks and return all rows.
sourceraw docstring

decode-finalizeclj

(decode-finalize decoder)

Emit any remaining rows. Returns [rows eod?].

Emit any remaining rows. Returns [rows eod?].
sourceraw docstring

decode-stepclj

(decode-step decoder chunk)

Process one chunk. Returns [decoder' rows eod?].

Process one chunk. Returns [decoder' rows eod?].
sourceraw docstring

make-decoderclj

(make-decoder {:keys [delimiter null-marker quote escape header force-not-null
                      force-null columns]
               :or {header :false}
               :as opts})

Build a fresh CSV decoder state.

Build a fresh CSV decoder state.
sourceraw docstring

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts
Ctrl+kJump to recent docs
Move to previous article
Move to next article
Ctrl+/Jump to the search field
× close