Public API for pg-datahike — PostgreSQL access for Datahike.
pg-datahike embeds a PG-compatible adapter inside a Datahike process:
wire protocol, SQL translator, virtual pg_* / information_schema
catalogs, constraint enforcement, schema hints. Clients that speak PG
(pgjdbc, psql, psycopg2, Rails AR, Hibernate, Odoo, Flyway, …) talk
to a Datahike database without a PG install. This namespace is the
stable surface; datahike.pg.* sub-namespaces hold implementation
details that may be reorganized without warning.
Single database:
(require '[datahike.pg :as pg] '[datahike.api :as d])
(d/create-database {:store {:backend :mem :id "demo"} :schema-flexibility :write}) (def conn (d/connect {:store {:backend :mem :id "demo"}}))
(def srv (pg/start-server conn {:port 5432})) ;; … clients connect to localhost:5432 … (pg/stop-server srv)
Multiple databases served from one server — clients route via the JDBC URL's database name (jdbc:postgresql://host:5432/prod):
(def srv (pg/start-server {"prod" prod-conn "staging" staging-conn} {:port 5432}))
Unknown database names are rejected with SQLSTATE 3D000.
Handlers are created per incoming connection by the server. For direct in-process use (tests, REPL), bypass the wire layer:
(def h (pg/make-query-handler conn)) (.execute h "SELECT 1")
A pre-existing Datahike database works immediately — attributes in the schema become virtual tables. Hints let you customize that view without changing the underlying schema:
(pg/set-hint! conn :person/full_name {:column "name"}) (pg/set-hint! conn :person/ssn {:hidden true}) (pg/set-hint! conn :person/company {:references :company/id})
:column — rename the SQL column:hidden — exclude the attr from SELECT * / information_schema:references — FK target for :db.type/ref attrs (makes
JOIN c ON p.fk = c.pk work without db_id):table — rename the SQL table for the attr's namespaceThe handler rejects unsupported DDL (GRANT, REVOKE, POLICY, ROW LEVEL SECURITY, CREATE EXTENSION, COPY) by default with SQLSTATE 0A000. For apps that can't be retrofitted to avoid emitting those — most ORMs — accept them silently per-feature or by named preset:
(pg/make-query-handler conn {:compat :permissive}) (pg/make-query-handler conn {:silently-accept #{:grant :policy}})
See datahike.pg.server/compat-presets for the preset bundles.
Datahike's as-of / since / history are exposed through SET:
SET datahike.as_of = '2024-01-15T00:00:00Z'; SET datahike.since = '2024-01-01T00:00:00Z'; SET datahike.history = 'true'; RESET datahike.as_of;
Libraries that expose app-level metadata (audit tables, health checks, per-tenant views) can register additional virtual catalog tables. Entries are {:schema [Datahike-schema entries including a row-marker attr] :data-fn (fn [user-schema cte-db] rows)}.
(pg/register-catalog-table! "app_metrics" {:schema [{:db/ident :app_metrics/key …} {:db/ident :app_metrics/val …} {:db/ident (datahike.pg.schema/row-marker-attr "app_metrics") …}] :data-fn (fn [_us _db] (current-metric-rows))})
The table is then queryable via SQL: SELECT * FROM app_metrics.
Public API for pg-datahike — PostgreSQL access for Datahike.
pg-datahike embeds a PG-compatible adapter inside a Datahike process:
wire protocol, SQL translator, virtual pg_* / information_schema
catalogs, constraint enforcement, schema hints. Clients that speak PG
(pgjdbc, psql, psycopg2, Rails AR, Hibernate, Odoo, Flyway, …) talk
to a Datahike database without a PG install. This namespace is the
stable surface; `datahike.pg.*` sub-namespaces hold implementation
details that may be reorganized without warning.
## Starting a server
Single database:
(require '[datahike.pg :as pg]
'[datahike.api :as d])
(d/create-database {:store {:backend :mem :id "demo"}
:schema-flexibility :write})
(def conn (d/connect {:store {:backend :mem :id "demo"}}))
(def srv (pg/start-server conn {:port 5432}))
;; … clients connect to localhost:5432 …
(pg/stop-server srv)
Multiple databases served from one server — clients route via the
JDBC URL's database name (jdbc:postgresql://host:5432/prod):
(def srv (pg/start-server {"prod" prod-conn
"staging" staging-conn}
{:port 5432}))
Unknown database names are rejected with SQLSTATE 3D000.
Handlers are created per incoming connection by the server. For
direct in-process use (tests, REPL), bypass the wire layer:
(def h (pg/make-query-handler conn))
(.execute h "SELECT 1")
## Schema hints for native Datahike databases
A pre-existing Datahike database works immediately — attributes in
the schema become virtual tables. Hints let you customize that view
without changing the underlying schema:
(pg/set-hint! conn :person/full_name {:column "name"})
(pg/set-hint! conn :person/ssn {:hidden true})
(pg/set-hint! conn :person/company {:references :company/id})
- `:column` — rename the SQL column
- `:hidden` — exclude the attr from SELECT * / information_schema
- `:references` — FK target for `:db.type/ref` attrs (makes
`JOIN c ON p.fk = c.pk` work without `db_id`)
- `:table` — rename the SQL table for the attr's namespace
## Compat / opt-out
The handler rejects unsupported DDL (GRANT, REVOKE, POLICY,
ROW LEVEL SECURITY, CREATE EXTENSION, COPY) by default with
SQLSTATE 0A000. For apps that can't be retrofitted to avoid
emitting those — most ORMs — accept them silently per-feature or
by named preset:
(pg/make-query-handler conn {:compat :permissive})
(pg/make-query-handler conn {:silently-accept #{:grant :policy}})
See `datahike.pg.server/compat-presets` for the preset bundles.
## Temporal queries
Datahike's `as-of` / `since` / `history` are exposed through SET:
SET datahike.as_of = '2024-01-15T00:00:00Z';
SET datahike.since = '2024-01-01T00:00:00Z';
SET datahike.history = 'true';
RESET datahike.as_of;
## Extending catalog surfaces
Libraries that expose app-level metadata (audit tables, health
checks, per-tenant views) can register additional virtual catalog
tables. Entries are {:schema [Datahike-schema entries including a
row-marker attr] :data-fn (fn [user-schema cte-db] rows)}.
(pg/register-catalog-table!
"app_metrics"
{:schema [{:db/ident :app_metrics/key …}
{:db/ident :app_metrics/val …}
{:db/ident (datahike.pg.schema/row-marker-attr
"app_metrics") …}]
:data-fn (fn [_us _db] (current-metric-rows))})
The table is then queryable via SQL: `SELECT * FROM app_metrics`.Structural SQL classifier — routes statements to the right handler before JSqlParser sees them.
Kills the regex sprawl across system-query?, parse-sql's CT6
guard, and the arg-extraction regexes (savepoint names, advisory-
lock keys, temporal SET values) by feeding those sites a real
tokenizer.
The tokenizer handles enough of PG lexical syntax to classify correctly in the face of keyword-inside-a-string, keyword-inside- a-comment, dollar-quoted strings, and case mix. It does NOT parse expressions — it yields a flat token stream for a keyword-dispatch classifier.
Classifier output: {:kind <statement-kind keyword> :name <savepoint/variable name if relevant> :args <vector of literal args — advisory keys, pg_sleep dur> :var <SET/RESET/SHOW variable name> :value <SET value (string) when captured> :reject-kind <opt-out knob key> :tag <synthetic command tag on silent-accept>}
:kind :generic-sql means 'pass to JSqlParser unchanged'.
Non-goals: full PG lexer, expression parsing. The token-driven preprocess-sql rewriter lives in datahike.pg.rewrite and consumes this tokenizer's output.
Structural SQL classifier — routes statements to the right handler
before JSqlParser sees them.
Kills the regex sprawl across `system-query?`, `parse-sql`'s CT6
guard, and the arg-extraction regexes (savepoint names, advisory-
lock keys, temporal SET values) by feeding those sites a real
tokenizer.
The tokenizer handles enough of PG lexical syntax to classify
correctly in the face of keyword-inside-a-string, keyword-inside-
a-comment, dollar-quoted strings, and case mix. It does NOT parse
expressions — it yields a flat token stream for a keyword-dispatch
classifier.
Classifier output:
{:kind <statement-kind keyword>
:name <savepoint/variable name if relevant>
:args <vector of literal args — advisory keys, pg_sleep dur>
:var <SET/RESET/SHOW variable name>
:value <SET value (string) when captured>
:reject-kind <opt-out knob key>
:tag <synthetic command tag on silent-accept>}
:kind :generic-sql means 'pass to JSqlParser unchanged'.
Non-goals: full PG lexer, expression parsing. The token-driven
preprocess-sql rewriter lives in datahike.pg.rewrite and consumes
this tokenizer's output.Exception → PostgreSQL SQLSTATE classification for the pgwire layer.
Postgres clients branch on SQLSTATE codes: Odoo retries 40001 (serialization_failure) and 40P01 (deadlock_detected); ORMs map 23505 to 'unique violation', 23502 to 'not-null violation', 22P02 to 'invalid input syntax', etc. Returning XX000 or 42000 for everything makes these signals invisible.
Lookup order when classifying a Throwable:
:sqlstate (explicit override wins).:datahike/canceled → "57014" (query_canceled). Datahike
core stays postgres-agnostic — it raises a plain
{:datahike/canceled true} ex-info; this layer translates to the
wire code at the boundary.:error keyword, via dh-error->sqlstate.:error).Canonical code list: postgres/src/backend/utils/errcodes.txt.
Exception → PostgreSQL SQLSTATE classification for the pgwire layer.
Postgres clients branch on SQLSTATE codes: Odoo retries 40001
(serialization_failure) and 40P01 (deadlock_detected); ORMs map 23505
to 'unique violation', 23502 to 'not-null violation', 22P02 to 'invalid
input syntax', etc. Returning XX000 or 42000 for everything makes these
signals invisible.
Lookup order when classifying a Throwable:
1. ex-data `:sqlstate` (explicit override wins).
2. ex-data `:datahike/canceled` → "57014" (query_canceled). Datahike
core stays postgres-agnostic — it raises a plain
`{:datahike/canceled true}` ex-info; this layer translates to the
wire code at the boundary.
3. ex-data `:error` keyword, via `dh-error->sqlstate`.
4. Regex pattern match on the exception message (for Datahike's
common unstructured errors that don't set `:error`).
5. Fallback: "XX000" (internal_error).
Canonical code list: postgres/src/backend/utils/errcodes.txt.PostgreSQL jsonb type support for the PgWire compatibility layer.
Stores jsonb values as Clojure data structures (maps, vectors, strings, numbers, booleans, nil) serialized to/from JSON strings via jsonista.
Implements PostgreSQL jsonb operators and functions as pure Clojure functions that can be used as Datalog predicates or post-processing.
PostgreSQL jsonb type support for the PgWire compatibility layer. Stores jsonb values as Clojure data structures (maps, vectors, strings, numbers, booleans, nil) serialized to/from JSON strings via jsonista. Implements PostgreSQL jsonb operators and functions as pure Clojure functions that can be used as Datalog predicates or post-processing.
Token-driven SQL source rewrites. Normalize SQL before JSqlParser sees it by excising or injecting source-level spans — all based on positions captured by the datahike.pg.classify tokenizer.
Each rule is a pure function (tokens) -> seq of spans, where a
span is [start end replacement]. The rewriter applies all non-
overlapping spans right-to-left (so earlier offsets stay stable)
and returns the new SQL string.
Why this exists: the previous preprocess-sql was a pile of regex
str/replace calls that could false-positive on keywords inside
string literals, dollar-quotes, or comments (SELECT 'REFERENCES'
was vulnerable to the inline-REFERENCES stripper). Token-based
rules only match tokens of the right kind, so a keyword inside a
:string or :comment is invisible to them.
Callers: sql/preprocess-sql.
Token-driven SQL source rewrites. Normalize SQL before JSqlParser sees it by excising or injecting source-level spans — all based on positions captured by the datahike.pg.classify tokenizer. Each rule is a pure function `(tokens) -> seq of spans`, where a span is `[start end replacement]`. The rewriter applies all non- overlapping spans right-to-left (so earlier offsets stay stable) and returns the new SQL string. Why this exists: the previous preprocess-sql was a pile of regex `str/replace` calls that could false-positive on keywords inside string literals, dollar-quotes, or comments (`SELECT 'REFERENCES'` was vulnerable to the inline-REFERENCES stripper). Token-based rules only match tokens of the right kind, so a keyword inside a :string or :comment is invisible to them. Callers: sql/preprocess-sql.
Derives virtual PostgreSQL table definitions from Datahike schemas.
Maps attribute namespace prefixes to table names and attribute local names to column names. For example: :person/name → table 'person', column 'name' :person/age → table 'person', column 'age'
Every virtual table gets an implicit 'db_id' column (the entity ID).
Derives virtual PostgreSQL table definitions from Datahike schemas. Maps attribute namespace prefixes to table names and attribute local names to column names. For example: :person/name → table 'person', column 'name' :person/age → table 'person', column 'age' Every virtual table gets an implicit 'db_id' column (the entity ID).
PostgreSQL wire protocol server for Datahike.
Starts a pgwire-compatible server that accepts SQL queries from standard PostgreSQL clients (psql, DBeaver, JDBC, Python/psycopg2, etc.) and translates them to Datahike Datalog queries.
Usage: (require '[datahike.api :as d]) (require '[datahike.pg.server :as pg])
(def conn (d/connect cfg)) (def server (pg/start-server conn {:port 5432})) ;; ... use any PostgreSQL client ... (pg/stop-server server)
PostgreSQL wire protocol server for Datahike.
Starts a pgwire-compatible server that accepts SQL queries from standard
PostgreSQL clients (psql, DBeaver, JDBC, Python/psycopg2, etc.) and
translates them to Datahike Datalog queries.
Usage:
(require '[datahike.api :as d])
(require '[datahike.pg.server :as pg])
(def conn (d/connect cfg))
(def server (pg/start-server conn {:port 5432}))
;; ... use any PostgreSQL client ...
(pg/stop-server server)Structural shape matching for long-form SELECT statements.
A companion to datahike.pg.classify: classify routes by the first few tokens; shape answers 'what does the rest of this SELECT look like?'. Used by sql/system-query?* to identify pgjdbc + Odoo catalog probes that can't be recognized by a leading keyword — their identifying signal lives deep in the projection list, the FROM clause, or buried qualified identifiers.
This replaces the substring-match branches the classifier previously fell back to (str/includes? on lowercased SQL). Substring matching is safe when the needle is a distinctive SQL-only string like 'fk.conname as name', but it's a different model than the rest of the pipeline: a string literal containing that phrase would false-match. Tokenizing and matching on structural features (qualified references, AS aliases, function names) is immune to keyword-inside-string / keyword-inside- comment hostile inputs.
API: (catalog-probe sql) → :get-fk-conname | :get-primary-keys | :get-field-metadata | :empty-catalog | nil
Structural shape matching for long-form SELECT statements.
A companion to datahike.pg.classify: classify routes by the first
few tokens; shape answers 'what does the rest of this SELECT look
like?'. Used by sql/system-query?* to identify pgjdbc + Odoo
catalog probes that can't be recognized by a leading keyword —
their identifying signal lives deep in the projection list, the
FROM clause, or buried qualified identifiers.
This replaces the substring-match branches the classifier
previously fell back to (str/includes? on lowercased SQL).
Substring matching is safe when the needle is a distinctive
SQL-only string like 'fk.conname as name', but it's a different
model than the rest of the pipeline: a string literal containing
that phrase would false-match. Tokenizing and matching on
structural features (qualified references, AS aliases, function
names) is immune to keyword-inside-string / keyword-inside-
comment hostile inputs.
API:
(catalog-probe sql) → :get-fk-conname | :get-primary-keys
| :get-field-metadata | :empty-catalog | nilSQL → Datahike Datalog translator.
Parses SQL strings using JSqlParser and translates the AST into Datahike
Datalog queries that can be executed by datahike.api/q.
The core mapping: attribute namespace prefixes become virtual table names. :person/name → table 'person', column 'name' :person/age → table 'person', column 'age'
Main entry points: (parse-sql sql schema) → {:type :select :query {...} :args [...]} | {:type :insert :tx-data [...]} | {:type :system :result QueryResult} | {:type :error :message str}
SQL → Datahike Datalog translator.
Parses SQL strings using JSqlParser and translates the AST into Datahike
Datalog queries that can be executed by `datahike.api/q`.
The core mapping: attribute namespace prefixes become virtual table names.
:person/name → table 'person', column 'name'
:person/age → table 'person', column 'age'
Main entry points:
(parse-sql sql schema) → {:type :select :query {...} :args [...]}
| {:type :insert :tx-data [...]}
| {:type :system :result QueryResult}
| {:type :error :message str}Virtual PostgreSQL catalog materialization + system-query routing.
Datahike has no real pg_catalog.* or information_schema.* tables —
pgwire synthesizes them on demand from the user schema. Every
catalog table the pg_* ecosystem expects is a {:schema [...] :data-fn (fn [user-schema cte-db] ...)} entry that builds its row
set lazily when a query references it.
The registry is extensible at runtime: libraries built on top of
pg-datahike can add their own virtual catalog tables via
register-catalog-table! (e.g. Odoo's internal metadata tables,
a pg_stat_activity-style probe, etc.) without modifying this
namespace.
Two public entry points:
register-catalog-table! / unregister-catalog-table!
— the extension seamextract-empty-catalog-shape / system-query?
— called by the wire handler to short-circuit common boot probes
(pgjdbc's field-metadata, Hibernate's feature detection) into
fast paths before JSqlParser even runs.Virtual PostgreSQL catalog materialization + system-query routing.
Datahike has no real `pg_catalog.*` or `information_schema.*` tables —
pgwire synthesizes them on demand from the user schema. Every
catalog table the pg_* ecosystem expects is a `{:schema [...]
:data-fn (fn [user-schema cte-db] ...)}` entry that builds its row
set lazily when a query references it.
The registry is extensible at runtime: libraries built on top of
pg-datahike can add their own virtual catalog tables via
`register-catalog-table!` (e.g. Odoo's internal metadata tables,
a `pg_stat_activity`-style probe, etc.) without modifying this
namespace.
Two public entry points:
- `register-catalog-table!` / `unregister-catalog-table!`
— the extension seam
- `extract-empty-catalog-shape` / `system-query?`
— called by the wire handler to short-circuit common boot probes
(pgjdbc's field-metadata, Hibernate's feature detection) into
fast paths before JSqlParser even runs.Translation context for the SQL → Datalog translator.
The ctx is an immutable map of per-translation atoms that every
translate-* fn threads through. Atoms capture the side-effecting
state (fresh-var counter, collected where clauses, entity-var
bindings, prepared-statement placeholders, …) while the outer map
stays shareable with nested sub-translations.
Fn inventory:
resolve-column / resolve-inherited-attr — map a JSqlParser
Column reference to a Datahike attribute keyword, following the
table-alias map and PostgreSQL INHERITS semantics.make-ctx — build a fresh context.fresh-var!, entity-var!, add-clause!, col-var! —
primitives the translators call to allocate logic variables,
assign entity bindings, append where clauses, and lazily
produce get-else-backed column bindings.materialize-arg! — bind a composite expression to a fresh var.null-guard-clauses / make-columns-optional! — SQL
3-valued-logic helpers.collect-vars — recursively gather ?v-style symbols out of
a translated form.Fns originally marked ^:private in sql.clj are promoted to
public here so the extracted translate-* namespaces can reach
them without re-exporting through the top-level sql ns.
Translation context for the SQL → Datalog translator. The `ctx` is an immutable map of per-translation atoms that every `translate-*` fn threads through. Atoms capture the side-effecting state (fresh-var counter, collected where clauses, entity-var bindings, prepared-statement placeholders, …) while the outer map stays shareable with nested sub-translations. Fn inventory: - `resolve-column` / `resolve-inherited-attr` — map a JSqlParser `Column` reference to a Datahike attribute keyword, following the table-alias map and PostgreSQL INHERITS semantics. - `make-ctx` — build a fresh context. - `fresh-var!`, `entity-var!`, `add-clause!`, `col-var!` — primitives the translators call to allocate logic variables, assign entity bindings, append where clauses, and lazily produce `get-else`-backed column bindings. - `materialize-arg!` — bind a composite expression to a fresh var. - `null-guard-clauses` / `make-columns-optional!` — SQL 3-valued-logic helpers. - `collect-vars` — recursively gather `?v`-style symbols out of a translated form. Fns originally marked `^:private` in sql.clj are promoted to public here so the extracted translate-* namespaces can reach them without re-exporting through the top-level sql ns.
DDL translation: CREATE TABLE, CREATE SEQUENCE, constraint extraction.
Produces Datahike tx-data that installs schema attributes (:db/ident,
:db/valueType, :db/cardinality, :db/unique, :pg/type, and the
pgwire-specific :pg/* constraint metadata).
The predicate helpers (identity-column?, column-is-primary-key?,
column-is-unique?, column-is-not-null?, specs-contain-seq?,
column-check-expr-text, column-default-spec) walk JSqlParser's
ColumnDefinition.getColumnSpecs lists, which are raw strings — so
each predicate is a case-insensitive token matcher.
Entry points:
Helpers:
DDL translation: CREATE TABLE, CREATE SEQUENCE, constraint extraction.
Produces Datahike tx-data that installs schema attributes (`:db/ident`,
`:db/valueType`, `:db/cardinality`, `:db/unique`, `:pg/type`, and the
pgwire-specific `:pg/*` constraint metadata).
The predicate helpers (`identity-column?`, `column-is-primary-key?`,
`column-is-unique?`, `column-is-not-null?`, `specs-contain-seq?`,
`column-check-expr-text`, `column-default-spec`) walk JSqlParser's
`ColumnDefinition.getColumnSpecs` lists, which are raw strings — so
each predicate is a case-insensitive token matcher.
Entry points:
- translate-create-table — main; returns {:type :ddl :tx-data [...]}
- translate-create-sequence — CREATE SEQUENCE name [...]
Helpers:
- extract-ddl-constraints — CHECK / NOT NULL / UNIQUE / FK clauses
- extract-inherits — PostgreSQL INHERITSExpression + predicate translation.
Converts JSqlParser expression nodes (arithmetic, comparisons, CASE, CAST, function calls, IN / BETWEEN / EXISTS subqueries, JSON operators, …) into Datalog find-element expressions and where-clause predicates.
Two families of entry points:
Expression side (translate-expr): evaluates to a value or
binds a var. Produces ?v symbols via materialization when the
result feeds into a SELECT projection, ORDER BY key, aggregate
argument, or outer SQL function call.
Predicate side (translate-predicate): produces a vec of
Datalog where-clauses that filter rows. Recursively handles
AND / OR / NOT, comparison ops with three-valued-logic null
guards, IN-list / IN-subquery, BETWEEN, IS NULL, EXISTS and
NOT EXISTS (including the correlated-subquery path with
entity-var unification + not-join fallback), LIKE / ILIKE,
JSON operators (@>, <@, ?, ?|, ?&), and regex match.
translate-expr and translate-predicate are mutually recursive
through translate-case-expr (WHEN predicates) and
translate-comparison (operand expressions). Everything lives in
one namespace so the recursion is a regular in-ns declare and
doesn't cross any ns boundary.
Back-edge to sql.clj: the EXISTS / IN-subquery branches need to
recursively re-enter the parser for sub-SQL. Rather than import
datahike.pg.sql (which would create a load cycle), we receive
parse-sql through the ctx's :parse-sql slot — populated by
datahike.pg.sql/parse-sql at top-level ctx construction and
inherited by sub-ctxs (for CTE / correlated subquery / derived
tables) when they copy from their outer ctx.
Every translate-* fn takes a ctx first-argument (built by
datahike.pg.sql.ctx/make-ctx) and returns either a value, a
fresh var, or a vector of where-clauses (for the predicate side).
Side effects flow through the ctx's atoms (:where-clauses,
:in-args, :param-placeholders, :entity-vars, :col->var).
Expression + predicate translation. Converts JSqlParser expression nodes (arithmetic, comparisons, CASE, CAST, function calls, IN / BETWEEN / EXISTS subqueries, JSON operators, …) into Datalog find-element expressions and where-clause predicates. Two families of entry points: - **Expression side** (`translate-expr`): evaluates to a value or binds a var. Produces `?v` symbols via materialization when the result feeds into a SELECT projection, ORDER BY key, aggregate argument, or outer SQL function call. - **Predicate side** (`translate-predicate`): produces a vec of Datalog where-clauses that filter rows. Recursively handles AND / OR / NOT, comparison ops with three-valued-logic null guards, IN-list / IN-subquery, BETWEEN, IS NULL, EXISTS and NOT EXISTS (including the correlated-subquery path with entity-var unification + not-join fallback), LIKE / ILIKE, JSON operators (@>, <@, ?, ?|, ?&), and regex match. `translate-expr` and `translate-predicate` are mutually recursive through `translate-case-expr` (WHEN predicates) and `translate-comparison` (operand expressions). Everything lives in one namespace so the recursion is a regular in-ns `declare` and doesn't cross any ns boundary. Back-edge to sql.clj: the EXISTS / IN-subquery branches need to recursively re-enter the parser for sub-SQL. Rather than import `datahike.pg.sql` (which would create a load cycle), we receive `parse-sql` through the ctx's `:parse-sql` slot — populated by `datahike.pg.sql/parse-sql` at top-level ctx construction and inherited by sub-ctxs (for CTE / correlated subquery / derived tables) when they copy from their outer ctx. Every translate-* fn takes a ctx first-argument (built by `datahike.pg.sql.ctx/make-ctx`) and returns either a value, a fresh var, or a vector of where-clauses (for the predicate side). Side effects flow through the ctx's atoms (`:where-clauses`, `:in-args`, `:param-placeholders`, `:entity-vars`, `:col->var`).
SQL → Clojure function wrappers.
Every named function the translator can emit at query time lives
here, behind a stable (fully-qualified) symbol that Datahike's
resolve-fn looks up at execute time:
Aggregate wrappers (filter-sum / filter-avg / …) — variants of
the core Datahike aggregates that skip the :__null__ sentinel and
treat nil as SQL NULL. Needed because core's sum/avg throw on
keyword arithmetic and min/max mis-order keyword vs number.
null-safe — wraps a scalar fn so either-operand-is-NULL returns
NULL. Used to adapt Clojure string/math fns (upper, length,
arithmetic) to SQL 3-valued logic.
SQL string fns (sql-lpad, sql-position, …) — thin wrappers
with standard SQL semantics (1-based positions, padding behavior).
PG-catalog-function stubs (pg-table-is-visible, pg-format-type,
pg-get-expr) — minimum viable to make JDBC drivers and ORMs
that probe the catalog not trip over missing functions.
Lookup tables:
sql-aggregate->datalog — SQL agg name → fully-qualified
symbol. Emitted symbol resolves via
datahike.pg.sql/filter-* which re-exports from this ns.sql-fn->clj-fn — SQL fn name → actual IFn value. Used by
translate-function-call at emit time to wrap in null-safe.Runtime symbols re-exported from datahike.pg.sql so the
translator's emitted forms keep the old qualified path
('datahike.pg.sql/filter-count, 'datahike.pg.sql/sql-+, …)
for backward compat with already-prepared statements cached on
clients.
SQL → Clojure function wrappers.
Every named function the translator can emit at query time lives
here, behind a stable (fully-qualified) symbol that Datahike's
`resolve-fn` looks up at execute time:
- Aggregate wrappers (`filter-sum` / `filter-avg` / …) — variants of
the core Datahike aggregates that skip the `:__null__` sentinel and
treat nil as SQL NULL. Needed because core's `sum`/`avg` throw on
keyword arithmetic and `min`/`max` mis-order keyword vs number.
- `null-safe` — wraps a scalar fn so either-operand-is-NULL returns
NULL. Used to adapt Clojure string/math fns (upper, length,
arithmetic) to SQL 3-valued logic.
- SQL string fns (`sql-lpad`, `sql-position`, …) — thin wrappers
with standard SQL semantics (1-based positions, padding behavior).
- PG-catalog-function stubs (`pg-table-is-visible`, `pg-format-type`,
`pg-get-expr`) — minimum viable to make JDBC drivers and ORMs
that probe the catalog not trip over missing functions.
- Lookup tables:
* `sql-aggregate->datalog` — SQL agg name → fully-qualified
symbol. Emitted symbol resolves via
`datahike.pg.sql/filter-*` which re-exports from this ns.
* `sql-fn->clj-fn` — SQL fn name → actual IFn value. Used by
`translate-function-call` at emit time to wrap in `null-safe`.
Runtime symbols re-exported from `datahike.pg.sql` so the
translator's emitted forms keep the old qualified path
(`'datahike.pg.sql/filter-count`, `'datahike.pg.sql/sql-+`, …)
for backward compat with already-prepared statements cached on
clients.Prepared-statement parameter substitution and PG OID inference.
Three concerns live together here because they all deal with the
? / $N placeholder lifecycle:
ParamRef record + param-ref?. Emitted by translators at Parse
time inside tx-data / query structures, replaced by real values
at Execute time via substitute-params.
*bound-params* dynamic var: when bound to a 1-indexed vector
of resolved values, translator branches (e.g. the JdbcParameter
expression) resolve placeholders in-line instead of emitting
ParamRef. This lets the same translator body serve both
prepared-Parse (returns structured AST with ParamRefs) and
re-translation-during-Execute (returns fully-bound form).
OID inference (infer-param-oid-for-column,
insert-param-oids, update-param-oids, where-param-oids) —
walks a JSqlParser AST, maps each placeholder index to the PG
type OID of the column it's bound against. Consumed by
describeParams to populate the ParameterDescription message so
pgjdbc / psycopg2 / etc. size their binary binds correctly.
*parse-db* is bound by parse-sql (higher-level) to the current
db snapshot so OID inference can consult :pg/type metadata that
(:schema db) doesn't surface.
Prepared-statement parameter substitution and PG OID inference. Three concerns live together here because they all deal with the `?` / `$N` placeholder lifecycle: 1. ParamRef record + `param-ref?`. Emitted by translators at Parse time inside tx-data / query structures, replaced by real values at Execute time via `substitute-params`. 2. `*bound-params*` dynamic var: when bound to a 1-indexed vector of resolved values, translator branches (e.g. the JdbcParameter expression) resolve placeholders in-line instead of emitting ParamRef. This lets the same translator body serve both prepared-Parse (returns structured AST with ParamRefs) and re-translation-during-Execute (returns fully-bound form). 3. OID inference (`infer-param-oid-for-column`, `insert-param-oids`, `update-param-oids`, `where-param-oids`) — walks a JSqlParser AST, maps each placeholder index to the PG type OID of the column it's bound against. Consumed by describeParams to populate the ParameterDescription message so pgjdbc / psycopg2 / etc. size their binary binds correctly. `*parse-db*` is bound by parse-sql (higher-level) to the current db snapshot so OID inference can consult :pg/type metadata that (:schema db) doesn't surface.
Statement-level translation: SELECT / INSERT / UPDATE / DELETE / CTE.
The top half of this namespace (translate-select + its join / HAVING / materialization machinery) maps a PlainSelect AST to a Datalog query-map plus row-formatting metadata the handler uses at execute time.
The middle half handles INSERT / UPDATE / DELETE: extract-value
lifts JSqlParser literal/expression nodes into Clojure values,
coerce-insert-value adapts them to the target column's
:db/valueType, and translate-insert / translate-update /
translate-delete produce tx-data + (for UPDATE/DELETE) an
eids-walk query. INSERT RETURNING and UPDATE RETURNING land in
extract-returning; CHECK / UPDATE expressions evaluated
per-row at handler time go through eval-check-predicate /
eval-update-expr.
The bottom half implements CTEs: translate-cte-branch
materializes one WITH-clause body against an enriched db so the
outer SELECT / DML can reference it as a virtual table;
translate-recursive-cte handles the WITH RECURSIVE form by
iterating until the CTE converges.
The three blocks live in one namespace because they are mutually recursive:
translate-select → translate-recursive-cte (WITH in SELECT) translate-recursive-cte → translate-select (CTE body) translate-insert → translate-select (INSERT ... SELECT) translate-insert → translate-recursive-cte (INSERT ... WITH) translate-select → extract-value (scalar subqueries)
Dependencies on already-extracted namespaces are all one-way:
stmt → expr (translate-expr, translate-predicate, …) stmt → ctx (make-ctx, col-var!, resolve-column, …) stmt → fns (aggregate lookups) stmt → params (ParamRef, from-bindings, parse-db) stmt → jsonb, schema, types (type coercion + jsonb ops)
Statement-level translation: SELECT / INSERT / UPDATE / DELETE / CTE. The top half of this namespace (translate-select + its join / HAVING / materialization machinery) maps a PlainSelect AST to a Datalog query-map plus row-formatting metadata the handler uses at execute time. The middle half handles INSERT / UPDATE / DELETE: `extract-value` lifts JSqlParser literal/expression nodes into Clojure values, `coerce-insert-value` adapts them to the target column's :db/valueType, and `translate-insert` / `translate-update` / `translate-delete` produce tx-data + (for UPDATE/DELETE) an eids-walk query. INSERT RETURNING and UPDATE RETURNING land in `extract-returning`; CHECK / UPDATE expressions evaluated per-row at handler time go through `eval-check-predicate` / `eval-update-expr`. The bottom half implements CTEs: `translate-cte-branch` materializes one WITH-clause body against an enriched db so the outer SELECT / DML can reference it as a virtual table; `translate-recursive-cte` handles the WITH RECURSIVE form by iterating until the CTE converges. The three blocks live in one namespace because they are mutually recursive: translate-select → translate-recursive-cte (WITH in SELECT) translate-recursive-cte → translate-select (CTE body) translate-insert → translate-select (INSERT ... SELECT) translate-insert → translate-recursive-cte (INSERT ... WITH) translate-select → extract-value (scalar subqueries) Dependencies on already-extracted namespaces are all one-way: stmt → expr (translate-expr, translate-predicate, …) stmt → ctx (make-ctx, col-var!, resolve-column, …) stmt → fns (aggregate lookups) stmt → params (ParamRef, *from-bindings*, *parse-db*) stmt → jsonb, schema, types (type coercion + jsonb ops)
PostgreSQL type system registry for the PgWire compatibility layer.
Centralizes all type mappings between PostgreSQL OIDs, SQL type names, Datahike value types, and wire protocol format codes.
Authoritative source: PostgreSQL 19devel src/include/catalog/pg_type.dat
Three directions of mapping:
PostgreSQL type system registry for the PgWire compatibility layer. Centralizes all type mappings between PostgreSQL OIDs, SQL type names, Datahike value types, and wire protocol format codes. Authoritative source: PostgreSQL 19devel src/include/catalog/pg_type.dat Three directions of mapping: 1. SQL name → Datahike type (for CREATE TABLE) 2. Datahike type → PG OID (for wire protocol RowDescription) 3. PG OID → SQL name (for format_type() and information_schema) 4. SQL name → category (for CAST type classification)
Window function post-processing engine.
Operates on query result rows (vectors of values) after the main Datalog query has executed. Computes ROW_NUMBER, RANK, DENSE_RANK, NTILE, PERCENT_RANK, CUME_DIST, running SUM/AVG/COUNT/MIN/MAX, and LAG/LEAD.
Architecture: matches Stratum's window.clj interface (same spec maps) but uses row-based processing instead of columnar typed arrays.
Window spec format: {:op :row-number/:rank/:dense-rank/:ntile/:sum/:avg/:count/:min/:max/:lag/:lead :col-idx int — column index for aggregate/offset ops (nil for ranking) :partition-by [idx ...] — column indices for partitioning :order-by [[idx :asc/:desc] ...] — column indices + direction for sorting :frame {:type :rows/:range :start bound :end bound} :offset int — LAG/LEAD offset (default 1) :default val — LAG/LEAD default value :ntile-n int} — NTILE bucket count
Window function post-processing engine.
Operates on query result rows (vectors of values) after the main Datalog
query has executed. Computes ROW_NUMBER, RANK, DENSE_RANK, NTILE,
PERCENT_RANK, CUME_DIST, running SUM/AVG/COUNT/MIN/MAX, and LAG/LEAD.
Architecture: matches Stratum's window.clj interface (same spec maps)
but uses row-based processing instead of columnar typed arrays.
Window spec format:
{:op :row-number/:rank/:dense-rank/:ntile/:sum/:avg/:count/:min/:max/:lag/:lead
:col-idx int — column index for aggregate/offset ops (nil for ranking)
:partition-by [idx ...] — column indices for partitioning
:order-by [[idx :asc/:desc] ...] — column indices + direction for sorting
:frame {:type :rows/:range :start bound :end bound}
:offset int — LAG/LEAD offset (default 1)
:default val — LAG/LEAD default value
:ntile-n int} — NTILE bucket countcljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |