PostgreSQL COPY-IN CSV-format decoder. Quote-aware state machine
matching PG's CopyReadAttributesCSV semantics from
../postgres/src/backend/commands/copyfromparse.c:1827.
CSV is a different beast from text format:
\. is not recognised inside CSV
streams (uses CopyDone / EOF instead)."" is empty-string, never
null (unless FORCE_NULL is set for that column).")The state machine per row:
Start in NOT_QUOTED. Walk bytes: delimiter → end of field line terminator → end of row quote char → enter QUOTED, set saw_quote=true else → append to field
In QUOTED, walk bytes: escape char (peek next): if next is escape or quote → consume, append literal else fall through (treat as literal) quote char → exit QUOTED (back to NOT_QUOTED) else → append to field
At end of row, for each field: if !saw_quote AND raw == null_marker → field is ::null else → field is the de-escaped string
FORCE_NOT_NULL columns: skip the null check (always treated as non-null even if raw matches null_marker).
FORCE_NULL columns: NULL check applies even if quoted (so a
quoted "" matching null_marker becomes null instead of
empty string).
Streaming API mirrors text-format:
(def d (make-decoder opts)) [d' rows eod?] = (decode-step d chunk) [rows eod?] = (decode-finalize d')
opts keys: :delimiter, :null-marker, :quote, :escape,
:header (:true|:false|:match), :force-not-null, :force-null,
:columns (used when HEADER MATCH is on).
Output is a vector of vectors-of-(String|::null).
PostgreSQL COPY-IN CSV-format decoder. Quote-aware state machine
matching PG's `CopyReadAttributesCSV` semantics from
`../postgres/src/backend/commands/copyfromparse.c:1827`.
CSV is a *different beast* from text format:
- Backslash is a literal char (no escape sequences).
- End-of-data marker `\.` is **not** recognised inside CSV
streams (uses CopyDone / EOF instead).
- NULL detection only fires on **unquoted** fields whose raw
text matches the null marker. `""` is empty-string, never
null (unless FORCE_NULL is set for that column).
- Embedded delimiters / line terminators are allowed inside
quoted fields.
- Embedded quote chars in quoted fields are escaped by either:
- doubling them (default: ESCAPE = QUOTE = `"`)
- prefixing with the configured escape char
The state machine per row:
- Start in NOT_QUOTED. Walk bytes:
delimiter → end of field
line terminator → end of row
quote char → enter QUOTED, set saw_quote=true
else → append to field
- In QUOTED, walk bytes:
escape char (peek next):
if next is escape or quote → consume, append literal
else fall through (treat as literal)
quote char → exit QUOTED (back to NOT_QUOTED)
else → append to field
- At end of row, for each field:
if !saw_quote AND raw == null_marker → field is ::null
else → field is the de-escaped string
- FORCE_NOT_NULL columns: skip the null check (always treated
as non-null even if raw matches null_marker).
- FORCE_NULL columns: NULL check applies even if quoted (so a
quoted `""` matching null_marker becomes null instead of
empty string).
Streaming API mirrors `text-format`:
(def d (make-decoder opts))
[d' rows eod?] = (decode-step d chunk)
[rows eod?] = (decode-finalize d')
`opts` keys: `:delimiter`, `:null-marker`, `:quote`, `:escape`,
`:header` (`:true|:false|:match`), `:force-not-null`, `:force-null`,
`:columns` (used when HEADER MATCH is on).
Output is a vector of vectors-of-(String|`::null`).(decode-all opts chunks)Convenience: decode an entire seq of chunks and return all rows.
Convenience: decode an entire seq of chunks and return all rows.
(decode-finalize decoder)Emit any remaining rows. Returns [rows eod?].
Emit any remaining rows. Returns [rows eod?].
(decode-step decoder chunk)Process one chunk. Returns [decoder' rows eod?].
Process one chunk. Returns [decoder' rows eod?].
(make-decoder {:keys [delimiter null-marker quote escape header force-not-null
force-null columns]
:or {header :false}
:as opts})Build a fresh CSV decoder state.
Build a fresh CSV decoder state.
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |