Liking cljdoc? Tell your friends :D

datahike.pg.sql.template

Lexical INSERT-VALUES templater for the parse-sql fast path.

Bulk-load workloads (pg_dump replay, ETL pipelines, log ingestion) repeatedly issue INSERT statements with the same structural shape but varying literal values. The full parse-sql pipeline — JSqlParser AST + pg-datahike SQL→Datalog translation — costs ~1 ms/call dominated by JSqlParser, so a 46k-row Pagila replay paid ~46 s on parsing alone.

The trick: every Pagila-style row of one table has the same shape, just different values. If we replace the literal values with ? placeholders, the templated SQL string is shared across all rows of that table — and our parse-sql LRU cache turns every-row-but-the-first into a sub-µs cache hit. We only need a fast lexical scan to extract literals; we don't need a real parser.

This namespace exports:

  • template-insert-sql — pure string transform; returns {:templated <sql> :literals [<tok>...]} or nil if the SQL doesn't match the simple INSERT VALUES shape we know how to template.
  • parse-literal-token — best-guess Java type for one captured literal (Long, Double, String, nil, Boolean, …).
  • typed-substitute — replace ParamRefs in a parsed result with values coerced through coerce-insert-value so types match each column's :db/valueType.

Anything outside the simple INSERT VALUES shape returns nil from template-insert-sql so the caller falls through to full parse-sql. Conservative: false negatives (missed templating) degrade gracefully to the existing path; we never produce wrong tx-data.

Lexical INSERT-VALUES templater for the parse-sql fast path.

Bulk-load workloads (`pg_dump` replay, ETL pipelines, log
ingestion) repeatedly issue INSERT statements with the same
structural shape but varying literal values. The full parse-sql
pipeline — JSqlParser AST + pg-datahike SQL→Datalog translation
— costs ~1 ms/call dominated by JSqlParser, so a 46k-row Pagila
replay paid ~46 s on parsing alone.

The trick: every Pagila-style row of one table has the *same*
shape, just different values. If we replace the literal values
with `?` placeholders, the templated SQL string is shared across
all rows of that table — and our parse-sql LRU cache turns
every-row-but-the-first into a sub-µs cache hit. We only need a
fast lexical scan to extract literals; we don't need a real
parser.

This namespace exports:

  - `template-insert-sql`  — pure string transform; returns
                             `{:templated <sql> :literals [<tok>...]}`
                             or nil if the SQL doesn't match the
                             simple INSERT VALUES shape we know
                             how to template.
  - `parse-literal-token`  — best-guess Java type for one captured
                             literal (Long, Double, String, nil,
                             Boolean, …).
  - `typed-substitute`     — replace ParamRefs in a parsed result
                             with values coerced through
                             `coerce-insert-value` so types match
                             each column's `:db/valueType`.

Anything outside the simple INSERT VALUES shape returns nil from
`template-insert-sql` so the caller falls through to full
parse-sql. Conservative: false negatives (missed templating)
degrade gracefully to the existing path; we never produce wrong
tx-data.
raw docstring

parse-literal-tokenclj

(parse-literal-token tok)

Convert a captured literal token to a 'best-guess' Java value:

  • numeric tokens → Long or Double
  • 'string'[::cast] → String (cast suffix discarded; the templated parse already encoded the cast structurally, and the typed-substitute pass below pipes the value through coerce-insert-value against the column's :db/valueType)
  • NULL / TRUE / FALSE → nil / true / false
  • anything else → a sentinel that signals the caller to fall back to full parse-sql.

Returns the sentinel rather than throwing so a single odd token in a long INSERT cleanly bails to the slow path instead of crashing the whole parse.

Convert a captured literal token to a 'best-guess' Java value:
  - numeric tokens → Long or Double
  - 'string'[::cast] → String (cast suffix discarded; the
    templated parse already encoded the cast structurally, and
    the typed-substitute pass below pipes the value through
    `coerce-insert-value` against the column's :db/valueType)
  - NULL / TRUE / FALSE → nil / true / false
  - anything else → a sentinel that signals the caller to fall
    back to full parse-sql.

Returns the sentinel rather than throwing so a single odd token
in a long INSERT cleanly bails to the slow path instead of
crashing the whole parse.
sourceraw docstring

template-insert-sqlclj

(template-insert-sql sql)

Lexically transform an INSERT statement of the shape

INSERT INTO t [( cols )] VALUES (lit, …) [, (lit, …)] [tail]

into a templated form with ? placeholders, capturing the original literals in declaration order. Returns

{:templated <sql> :literals [<token>, ...]}

on success, or nil if the SQL doesn't match the simple shape (no VALUES keyword, ON CONFLICT clause, malformed paren structure, etc.). The caller should fall through to the full parser on nil.

Tail content (e.g. RETURNING …) is preserved verbatim — we stop templating once we run out of (…) tuples and copy the rest of the string as-is.

Lexically transform an INSERT statement of the shape

  INSERT INTO t [( cols )] VALUES (lit, …) [, (lit, …)] [tail]

into a templated form with `?` placeholders, capturing the
original literals in declaration order. Returns

  {:templated <sql>  :literals [<token>, ...]}

on success, or nil if the SQL doesn't match the simple shape (no
VALUES keyword, ON CONFLICT clause, malformed paren structure,
etc.). The caller should fall through to the full parser on nil.

Tail content (e.g. `RETURNING …`) is preserved verbatim — we
stop templating once we run out of `(…)` tuples and copy the
rest of the string as-is.
sourceraw docstring

templater-fail?clj

(templater-fail? v)

True if parse-literal-token returned its bail-out sentinel. The integration layer treats this as 'fall through to full parse-sql for this statement'.

True if `parse-literal-token` returned its bail-out sentinel.
The integration layer treats this as 'fall through to full
parse-sql for this statement'.
sourceraw docstring

typed-substituteclj

(typed-substitute parsed literals schema)

Replace ParamRefs in parsed.tx-data with bound values from literals, coercing each via stmt/coerce-insert-value against the column's :db/valueType so string→instant, string→long, etc. land at the same shape the literal-SQL parse would have produced.

Returns nil if any literal can't be parsed cleanly — the caller uses that as the signal to fall through to full parse-sql. literals is a 0-indexed vector of captured token strings (as returned by template-insert-sql); ParamRef N pulls (parse-literal-token (literals (dec N))) and runs coercion.

Also rewrites every entity-map's :db/id with a fresh gensym so the cached parsed map (shared across calls) doesn't hand Datahike colliding tempids when consecutive Bind/Execute cycles land in the same wire-layer batch.

Replace ParamRefs in `parsed.tx-data` with bound values from
`literals`, coercing each via `stmt/coerce-insert-value` against
the column's `:db/valueType` so string→instant, string→long,
etc. land at the same shape the literal-SQL parse would have
produced.

Returns nil if any literal can't be parsed cleanly — the caller
uses that as the signal to fall through to full parse-sql.
`literals` is a 0-indexed vector of captured token strings (as
returned by `template-insert-sql`); ParamRef N pulls
`(parse-literal-token (literals (dec N)))` and runs coercion.

Also rewrites every entity-map's :db/id with a fresh gensym so
the cached parsed map (shared across calls) doesn't hand
Datahike colliding tempids when consecutive Bind/Execute cycles
land in the same wire-layer batch.
sourceraw docstring

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts
Ctrl+kJump to recent docs
Move to previous article
Move to next article
Ctrl+/Jump to the search field
× close