Lexical INSERT-VALUES templater for the parse-sql fast path.
Bulk-load workloads (pg_dump replay, ETL pipelines, log
ingestion) repeatedly issue INSERT statements with the same
structural shape but varying literal values. The full parse-sql
pipeline — JSqlParser AST + pg-datahike SQL→Datalog translation
— costs ~1 ms/call dominated by JSqlParser, so a 46k-row Pagila
replay paid ~46 s on parsing alone.
The trick: every Pagila-style row of one table has the same
shape, just different values. If we replace the literal values
with ? placeholders, the templated SQL string is shared across
all rows of that table — and our parse-sql LRU cache turns
every-row-but-the-first into a sub-µs cache hit. We only need a
fast lexical scan to extract literals; we don't need a real
parser.
This namespace exports:
template-insert-sql — pure string transform; returns
{:templated <sql> :literals [<tok>...]}
or nil if the SQL doesn't match the
simple INSERT VALUES shape we know
how to template.parse-literal-token — best-guess Java type for one captured
literal (Long, Double, String, nil,
Boolean, …).typed-substitute — replace ParamRefs in a parsed result
with values coerced through
coerce-insert-value so types match
each column's :db/valueType.Anything outside the simple INSERT VALUES shape returns nil from
template-insert-sql so the caller falls through to full
parse-sql. Conservative: false negatives (missed templating)
degrade gracefully to the existing path; we never produce wrong
tx-data.
Lexical INSERT-VALUES templater for the parse-sql fast path.
Bulk-load workloads (`pg_dump` replay, ETL pipelines, log
ingestion) repeatedly issue INSERT statements with the same
structural shape but varying literal values. The full parse-sql
pipeline — JSqlParser AST + pg-datahike SQL→Datalog translation
— costs ~1 ms/call dominated by JSqlParser, so a 46k-row Pagila
replay paid ~46 s on parsing alone.
The trick: every Pagila-style row of one table has the *same*
shape, just different values. If we replace the literal values
with `?` placeholders, the templated SQL string is shared across
all rows of that table — and our parse-sql LRU cache turns
every-row-but-the-first into a sub-µs cache hit. We only need a
fast lexical scan to extract literals; we don't need a real
parser.
This namespace exports:
- `template-insert-sql` — pure string transform; returns
`{:templated <sql> :literals [<tok>...]}`
or nil if the SQL doesn't match the
simple INSERT VALUES shape we know
how to template.
- `parse-literal-token` — best-guess Java type for one captured
literal (Long, Double, String, nil,
Boolean, …).
- `typed-substitute` — replace ParamRefs in a parsed result
with values coerced through
`coerce-insert-value` so types match
each column's `:db/valueType`.
Anything outside the simple INSERT VALUES shape returns nil from
`template-insert-sql` so the caller falls through to full
parse-sql. Conservative: false negatives (missed templating)
degrade gracefully to the existing path; we never produce wrong
tx-data.(parse-literal-token tok)Convert a captured literal token to a 'best-guess' Java value:
coerce-insert-value against the column's :db/valueType)Returns the sentinel rather than throwing so a single odd token in a long INSERT cleanly bails to the slow path instead of crashing the whole parse.
Convert a captured literal token to a 'best-guess' Java value:
- numeric tokens → Long or Double
- 'string'[::cast] → String (cast suffix discarded; the
templated parse already encoded the cast structurally, and
the typed-substitute pass below pipes the value through
`coerce-insert-value` against the column's :db/valueType)
- NULL / TRUE / FALSE → nil / true / false
- anything else → a sentinel that signals the caller to fall
back to full parse-sql.
Returns the sentinel rather than throwing so a single odd token
in a long INSERT cleanly bails to the slow path instead of
crashing the whole parse.(template-insert-sql sql)Lexically transform an INSERT statement of the shape
INSERT INTO t [( cols )] VALUES (lit, …) [, (lit, …)] [tail]
into a templated form with ? placeholders, capturing the
original literals in declaration order. Returns
{:templated <sql> :literals [<token>, ...]}
on success, or nil if the SQL doesn't match the simple shape (no VALUES keyword, ON CONFLICT clause, malformed paren structure, etc.). The caller should fall through to the full parser on nil.
Tail content (e.g. RETURNING …) is preserved verbatim — we
stop templating once we run out of (…) tuples and copy the
rest of the string as-is.
Lexically transform an INSERT statement of the shape
INSERT INTO t [( cols )] VALUES (lit, …) [, (lit, …)] [tail]
into a templated form with `?` placeholders, capturing the
original literals in declaration order. Returns
{:templated <sql> :literals [<token>, ...]}
on success, or nil if the SQL doesn't match the simple shape (no
VALUES keyword, ON CONFLICT clause, malformed paren structure,
etc.). The caller should fall through to the full parser on nil.
Tail content (e.g. `RETURNING …`) is preserved verbatim — we
stop templating once we run out of `(…)` tuples and copy the
rest of the string as-is.(templater-fail? v)True if parse-literal-token returned its bail-out sentinel.
The integration layer treats this as 'fall through to full
parse-sql for this statement'.
True if `parse-literal-token` returned its bail-out sentinel. The integration layer treats this as 'fall through to full parse-sql for this statement'.
(typed-substitute parsed literals schema)Replace ParamRefs in parsed.tx-data with bound values from
literals, coercing each via stmt/coerce-insert-value against
the column's :db/valueType so string→instant, string→long,
etc. land at the same shape the literal-SQL parse would have
produced.
Returns nil if any literal can't be parsed cleanly — the caller
uses that as the signal to fall through to full parse-sql.
literals is a 0-indexed vector of captured token strings (as
returned by template-insert-sql); ParamRef N pulls
(parse-literal-token (literals (dec N))) and runs coercion.
Also rewrites every entity-map's :db/id with a fresh gensym so the cached parsed map (shared across calls) doesn't hand Datahike colliding tempids when consecutive Bind/Execute cycles land in the same wire-layer batch.
Replace ParamRefs in `parsed.tx-data` with bound values from `literals`, coercing each via `stmt/coerce-insert-value` against the column's `:db/valueType` so string→instant, string→long, etc. land at the same shape the literal-SQL parse would have produced. Returns nil if any literal can't be parsed cleanly — the caller uses that as the signal to fall through to full parse-sql. `literals` is a 0-indexed vector of captured token strings (as returned by `template-insert-sql`); ParamRef N pulls `(parse-literal-token (literals (dec N)))` and runs coercion. Also rewrites every entity-map's :db/id with a fresh gensym so the cached parsed map (shared across calls) doesn't hand Datahike colliding tempids when consecutive Bind/Execute cycles land in the same wire-layer batch.
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |