Liking cljdoc? Tell your friends :D

datahike.pg

Public API for pg-datahike — PostgreSQL access for Datahike.

pg-datahike embeds a PG-compatible adapter inside a Datahike process: wire protocol, SQL translator, virtual pg_* / information_schema catalogs, constraint enforcement, schema hints. Clients that speak PG (pgjdbc, psql, psycopg2, Rails AR, Hibernate, Odoo, Flyway, …) talk to a Datahike database without a PG install. This namespace is the stable surface; datahike.pg.* sub-namespaces hold implementation details that may be reorganized without warning.

Starting a server

Single database:

(require '[datahike.pg :as pg] '[datahike.api :as d])

(d/create-database {:store {:backend :mem :id "demo"} :schema-flexibility :write}) (def conn (d/connect {:store {:backend :mem :id "demo"}}))

(def srv (pg/start-server conn {:port 5432})) ;; … clients connect to localhost:5432 … (pg/stop-server srv)

Multiple databases served from one server — clients route via the JDBC URL's database name (jdbc:postgresql://host:5432/prod):

(def srv (pg/start-server {"prod" prod-conn "staging" staging-conn} {:port 5432}))

Unknown database names are rejected with SQLSTATE 3D000.

Handlers are created per incoming connection by the server. For direct in-process use (tests, REPL), bypass the wire layer:

(def h (pg/make-query-handler conn)) (.execute h "SELECT 1")

Schema hints for native Datahike databases

A pre-existing Datahike database works immediately — attributes in the schema become virtual tables. Hints let you customize that view without changing the underlying schema:

(pg/set-hint! conn :person/full_name {:column "name"}) (pg/set-hint! conn :person/ssn {:hidden true}) (pg/set-hint! conn :person/company {:references :company/id})

  • :column — rename the SQL column
  • :hidden — exclude the attr from SELECT * / information_schema
  • :references — FK target for :db.type/ref attrs (makes JOIN c ON p.fk = c.pk work without db_id)
  • :table — rename the SQL table for the attr's namespace

Compat / opt-out

The handler rejects unsupported DDL (GRANT, REVOKE, POLICY, ROW LEVEL SECURITY, CREATE EXTENSION, COPY) by default with SQLSTATE 0A000. For apps that can't be retrofitted to avoid emitting those — most ORMs — accept them silently per-feature or by named preset:

(pg/make-query-handler conn {:compat :permissive}) (pg/make-query-handler conn {:silently-accept #{:grant :policy}})

See datahike.pg.server/compat-presets for the preset bundles.

Temporal queries

Datahike's as-of / since / history are exposed through SET:

SET datahike.as_of = '2024-01-15T00:00:00Z'; SET datahike.since = '2024-01-01T00:00:00Z'; SET datahike.history = 'true'; RESET datahike.as_of;

Extending catalog surfaces

Libraries that expose app-level metadata (audit tables, health checks, per-tenant views) can register additional virtual catalog tables. Entries are {:schema [Datahike-schema entries including a row-marker attr] :data-fn (fn [user-schema cte-db] rows)}.

(pg/register-catalog-table! "app_metrics" {:schema [{:db/ident :app_metrics/key …} {:db/ident :app_metrics/val …} {:db/ident (datahike.pg.schema/row-marker-attr "app_metrics") …}] :data-fn (fn [_us _db] (current-metric-rows))})

The table is then queryable via SQL: SELECT * FROM app_metrics.

Public API for pg-datahike — PostgreSQL access for Datahike.

pg-datahike embeds a PG-compatible adapter inside a Datahike process:
wire protocol, SQL translator, virtual pg_* / information_schema
catalogs, constraint enforcement, schema hints. Clients that speak PG
(pgjdbc, psql, psycopg2, Rails AR, Hibernate, Odoo, Flyway, …) talk
to a Datahike database without a PG install. This namespace is the
stable surface; `datahike.pg.*` sub-namespaces hold implementation
details that may be reorganized without warning.

## Starting a server

Single database:

  (require '[datahike.pg :as pg]
           '[datahike.api :as d])

  (d/create-database {:store {:backend :mem :id "demo"}
                      :schema-flexibility :write})
  (def conn (d/connect {:store {:backend :mem :id "demo"}}))

  (def srv (pg/start-server conn {:port 5432}))
  ;; … clients connect to localhost:5432 …
  (pg/stop-server srv)

Multiple databases served from one server — clients route via the
JDBC URL's database name (jdbc:postgresql://host:5432/prod):

  (def srv (pg/start-server {"prod"    prod-conn
                             "staging" staging-conn}
                            {:port 5432}))

Unknown database names are rejected with SQLSTATE 3D000.

Handlers are created per incoming connection by the server. For
direct in-process use (tests, REPL), bypass the wire layer:

  (def h (pg/make-query-handler conn))
  (.execute h "SELECT 1")

## Schema hints for native Datahike databases

A pre-existing Datahike database works immediately — attributes in
the schema become virtual tables. Hints let you customize that view
without changing the underlying schema:

  (pg/set-hint! conn :person/full_name {:column "name"})
  (pg/set-hint! conn :person/ssn       {:hidden true})
  (pg/set-hint! conn :person/company   {:references :company/id})

- `:column`     — rename the SQL column
- `:hidden`     — exclude the attr from SELECT * / information_schema
- `:references` — FK target for `:db.type/ref` attrs (makes
                  `JOIN c ON p.fk = c.pk` work without `db_id`)
- `:table`      — rename the SQL table for the attr's namespace

## Compat / opt-out

The handler rejects unsupported DDL (GRANT, REVOKE, POLICY,
ROW LEVEL SECURITY, CREATE EXTENSION, COPY) by default with
SQLSTATE 0A000. For apps that can't be retrofitted to avoid
emitting those — most ORMs — accept them silently per-feature or
by named preset:

  (pg/make-query-handler conn {:compat :permissive})
  (pg/make-query-handler conn {:silently-accept #{:grant :policy}})

See `datahike.pg.server/compat-presets` for the preset bundles.

## Temporal queries

Datahike's `as-of` / `since` / `history` are exposed through SET:

  SET datahike.as_of   = '2024-01-15T00:00:00Z';
  SET datahike.since   = '2024-01-01T00:00:00Z';
  SET datahike.history = 'true';
  RESET datahike.as_of;

## Extending catalog surfaces

Libraries that expose app-level metadata (audit tables, health
checks, per-tenant views) can register additional virtual catalog
tables. Entries are {:schema [Datahike-schema entries including a
row-marker attr] :data-fn (fn [user-schema cte-db] rows)}.

  (pg/register-catalog-table!
    "app_metrics"
    {:schema [{:db/ident :app_metrics/key …}
              {:db/ident :app_metrics/val …}
              {:db/ident (datahike.pg.schema/row-marker-attr
                          "app_metrics") …}]
     :data-fn (fn [_us _db] (current-metric-rows))})

The table is then queryable via SQL: `SELECT * FROM app_metrics`.
raw docstring

datahike.pg.classify

Structural SQL classifier — routes statements to the right handler before JSqlParser sees them.

Kills the regex sprawl across system-query?, parse-sql's CT6 guard, and the arg-extraction regexes (savepoint names, advisory- lock keys, temporal SET values) by feeding those sites a real tokenizer.

The tokenizer handles enough of PG lexical syntax to classify correctly in the face of keyword-inside-a-string, keyword-inside- a-comment, dollar-quoted strings, and case mix. It does NOT parse expressions — it yields a flat token stream for a keyword-dispatch classifier.

Classifier output: {:kind <statement-kind keyword> :name <savepoint/variable name if relevant> :args <vector of literal args — advisory keys, pg_sleep dur> :var <SET/RESET/SHOW variable name> :value <SET value (string) when captured> :reject-kind <opt-out knob key> :tag <synthetic command tag on silent-accept>}

:kind :generic-sql means 'pass to JSqlParser unchanged'.

Non-goals: full PG lexer, expression parsing. The token-driven preprocess-sql rewriter lives in datahike.pg.rewrite and consumes this tokenizer's output.

Structural SQL classifier — routes statements to the right handler
before JSqlParser sees them.

Kills the regex sprawl across `system-query?`, `parse-sql`'s CT6
guard, and the arg-extraction regexes (savepoint names, advisory-
lock keys, temporal SET values) by feeding those sites a real
tokenizer.

The tokenizer handles enough of PG lexical syntax to classify
correctly in the face of keyword-inside-a-string, keyword-inside-
a-comment, dollar-quoted strings, and case mix. It does NOT parse
expressions — it yields a flat token stream for a keyword-dispatch
classifier.

Classifier output:
  {:kind        <statement-kind keyword>
   :name        <savepoint/variable name if relevant>
   :args        <vector of literal args — advisory keys, pg_sleep dur>
   :var         <SET/RESET/SHOW variable name>
   :value       <SET value (string) when captured>
   :reject-kind <opt-out knob key>
   :tag         <synthetic command tag on silent-accept>}

:kind :generic-sql means 'pass to JSqlParser unchanged'.

Non-goals: full PG lexer, expression parsing. The token-driven
preprocess-sql rewriter lives in datahike.pg.rewrite and consumes
this tokenizer's output.
raw docstring

datahike.pg.errors

Exception → PostgreSQL SQLSTATE classification for the pgwire layer.

Postgres clients branch on SQLSTATE codes: Odoo retries 40001 (serialization_failure) and 40P01 (deadlock_detected); ORMs map 23505 to 'unique violation', 23502 to 'not-null violation', 22P02 to 'invalid input syntax', etc. Returning XX000 or 42000 for everything makes these signals invisible.

Lookup order when classifying a Throwable:

  1. ex-data :sqlstate (explicit override wins).
  2. ex-data :datahike/canceled → "57014" (query_canceled). Datahike core stays postgres-agnostic — it raises a plain {:datahike/canceled true} ex-info; this layer translates to the wire code at the boundary.
  3. ex-data :error keyword, via dh-error->sqlstate.
  4. Regex pattern match on the exception message (for Datahike's common unstructured errors that don't set :error).
  5. Fallback: "XX000" (internal_error).

Canonical code list: postgres/src/backend/utils/errcodes.txt.

Exception → PostgreSQL SQLSTATE classification for the pgwire layer.

Postgres clients branch on SQLSTATE codes: Odoo retries 40001
(serialization_failure) and 40P01 (deadlock_detected); ORMs map 23505
to 'unique violation', 23502 to 'not-null violation', 22P02 to 'invalid
input syntax', etc. Returning XX000 or 42000 for everything makes these
signals invisible.

Lookup order when classifying a Throwable:
  1. ex-data `:sqlstate` (explicit override wins).
  2. ex-data `:datahike/canceled` → "57014" (query_canceled). Datahike
     core stays postgres-agnostic — it raises a plain
     `{:datahike/canceled true}` ex-info; this layer translates to the
     wire code at the boundary.
  3. ex-data `:error` keyword, via `dh-error->sqlstate`.
  4. Regex pattern match on the exception message (for Datahike's
     common unstructured errors that don't set `:error`).
  5. Fallback: "XX000" (internal_error).

Canonical code list: postgres/src/backend/utils/errcodes.txt.
raw docstring

datahike.pg.jsonb

PostgreSQL jsonb type support for the PgWire compatibility layer.

Stores jsonb values as Clojure data structures (maps, vectors, strings, numbers, booleans, nil) serialized to/from JSON strings via jsonista.

Implements PostgreSQL jsonb operators and functions as pure Clojure functions that can be used as Datalog predicates or post-processing.

PostgreSQL jsonb type support for the PgWire compatibility layer.

Stores jsonb values as Clojure data structures (maps, vectors, strings,
numbers, booleans, nil) serialized to/from JSON strings via jsonista.

Implements PostgreSQL jsonb operators and functions as pure Clojure
functions that can be used as Datalog predicates or post-processing.
raw docstring

datahike.pg.rewrite

Token-driven SQL source rewrites. Normalize SQL before JSqlParser sees it by excising or injecting source-level spans — all based on positions captured by the datahike.pg.classify tokenizer.

Each rule is a pure function (tokens) -> seq of spans, where a span is [start end replacement]. The rewriter applies all non- overlapping spans right-to-left (so earlier offsets stay stable) and returns the new SQL string.

Why this exists: the previous preprocess-sql was a pile of regex str/replace calls that could false-positive on keywords inside string literals, dollar-quotes, or comments (SELECT 'REFERENCES' was vulnerable to the inline-REFERENCES stripper). Token-based rules only match tokens of the right kind, so a keyword inside a :string or :comment is invisible to them.

Callers: sql/preprocess-sql.

Token-driven SQL source rewrites. Normalize SQL before JSqlParser
sees it by excising or injecting source-level spans — all based on
positions captured by the datahike.pg.classify tokenizer.

Each rule is a pure function `(tokens) -> seq of spans`, where a
span is `[start end replacement]`. The rewriter applies all non-
overlapping spans right-to-left (so earlier offsets stay stable)
and returns the new SQL string.

Why this exists: the previous preprocess-sql was a pile of regex
`str/replace` calls that could false-positive on keywords inside
string literals, dollar-quotes, or comments (`SELECT 'REFERENCES'`
was vulnerable to the inline-REFERENCES stripper). Token-based
rules only match tokens of the right kind, so a keyword inside a
:string or :comment is invisible to them.

Callers: sql/preprocess-sql.
raw docstring

datahike.pg.schema

Derives virtual PostgreSQL table definitions from Datahike schemas.

Maps attribute namespace prefixes to table names and attribute local names to column names. For example: :person/name → table 'person', column 'name' :person/age → table 'person', column 'age'

Every virtual table gets an implicit 'db_id' column (the entity ID).

Derives virtual PostgreSQL table definitions from Datahike schemas.

Maps attribute namespace prefixes to table names and attribute local names
to column names. For example:
  :person/name  → table 'person', column 'name'
  :person/age   → table 'person', column 'age'

Every virtual table gets an implicit 'db_id' column (the entity ID).
raw docstring

datahike.pg.server

PostgreSQL wire protocol server for Datahike.

Starts a pgwire-compatible server that accepts SQL queries from standard PostgreSQL clients (psql, DBeaver, JDBC, Python/psycopg2, etc.) and translates them to Datahike Datalog queries.

Usage: (require '[datahike.api :as d]) (require '[datahike.pg.server :as pg])

(def conn (d/connect cfg)) (def server (pg/start-server conn {:port 5432})) ;; ... use any PostgreSQL client ... (pg/stop-server server)

PostgreSQL wire protocol server for Datahike.

Starts a pgwire-compatible server that accepts SQL queries from standard
PostgreSQL clients (psql, DBeaver, JDBC, Python/psycopg2, etc.) and
translates them to Datahike Datalog queries.

Usage:
  (require '[datahike.api :as d])
  (require '[datahike.pg.server :as pg])

  (def conn (d/connect cfg))
  (def server (pg/start-server conn {:port 5432}))
  ;; ... use any PostgreSQL client ...
  (pg/stop-server server)
raw docstring

datahike.pg.shape

Structural shape matching for long-form SELECT statements.

A companion to datahike.pg.classify: classify routes by the first few tokens; shape answers 'what does the rest of this SELECT look like?'. Used by sql/system-query?* to identify pgjdbc + Odoo catalog probes that can't be recognized by a leading keyword — their identifying signal lives deep in the projection list, the FROM clause, or buried qualified identifiers.

This replaces the substring-match branches the classifier previously fell back to (str/includes? on lowercased SQL). Substring matching is safe when the needle is a distinctive SQL-only string like 'fk.conname as name', but it's a different model than the rest of the pipeline: a string literal containing that phrase would false-match. Tokenizing and matching on structural features (qualified references, AS aliases, function names) is immune to keyword-inside-string / keyword-inside- comment hostile inputs.

API: (catalog-probe sql) → :get-fk-conname | :get-primary-keys | :get-field-metadata | :empty-catalog | nil

Structural shape matching for long-form SELECT statements.

A companion to datahike.pg.classify: classify routes by the first
few tokens; shape answers 'what does the rest of this SELECT look
like?'. Used by sql/system-query?* to identify pgjdbc + Odoo
catalog probes that can't be recognized by a leading keyword —
their identifying signal lives deep in the projection list, the
FROM clause, or buried qualified identifiers.

This replaces the substring-match branches the classifier
previously fell back to (str/includes? on lowercased SQL).
Substring matching is safe when the needle is a distinctive
SQL-only string like 'fk.conname as name', but it's a different
model than the rest of the pipeline: a string literal containing
that phrase would false-match. Tokenizing and matching on
structural features (qualified references, AS aliases, function
names) is immune to keyword-inside-string / keyword-inside-
comment hostile inputs.

API:
  (catalog-probe sql) → :get-fk-conname | :get-primary-keys
                      | :get-field-metadata | :empty-catalog | nil
raw docstring

datahike.pg.sql

SQL → Datahike Datalog translator.

Parses SQL strings using JSqlParser and translates the AST into Datahike Datalog queries that can be executed by datahike.api/q.

The core mapping: attribute namespace prefixes become virtual table names. :person/name → table 'person', column 'name' :person/age → table 'person', column 'age'

Main entry points: (parse-sql sql schema) → {:type :select :query {...} :args [...]} | {:type :insert :tx-data [...]} | {:type :system :result QueryResult} | {:type :error :message str}

SQL → Datahike Datalog translator.

Parses SQL strings using JSqlParser and translates the AST into Datahike
Datalog queries that can be executed by `datahike.api/q`.

The core mapping: attribute namespace prefixes become virtual table names.
  :person/name  → table 'person', column 'name'
  :person/age   → table 'person', column 'age'

Main entry points:
  (parse-sql sql schema)  → {:type :select :query {...} :args [...]}
                           | {:type :insert :tx-data [...]}
                           | {:type :system :result QueryResult}
                           | {:type :error :message str}
raw docstring

datahike.pg.sql.catalog

Virtual PostgreSQL catalog materialization + system-query routing.

Datahike has no real pg_catalog.* or information_schema.* tables — pgwire synthesizes them on demand from the user schema. Every catalog table the pg_* ecosystem expects is a {:schema [...] :data-fn (fn [user-schema cte-db] ...)} entry that builds its row set lazily when a query references it.

The registry is extensible at runtime: libraries built on top of pg-datahike can add their own virtual catalog tables via register-catalog-table! (e.g. Odoo's internal metadata tables, a pg_stat_activity-style probe, etc.) without modifying this namespace.

Two public entry points:

  • register-catalog-table! / unregister-catalog-table! — the extension seam
  • extract-empty-catalog-shape / system-query? — called by the wire handler to short-circuit common boot probes (pgjdbc's field-metadata, Hibernate's feature detection) into fast paths before JSqlParser even runs.
Virtual PostgreSQL catalog materialization + system-query routing.

Datahike has no real `pg_catalog.*` or `information_schema.*` tables —
pgwire synthesizes them on demand from the user schema. Every
catalog table the pg_* ecosystem expects is a `{:schema [...]
:data-fn (fn [user-schema cte-db] ...)}` entry that builds its row
set lazily when a query references it.

The registry is extensible at runtime: libraries built on top of
pg-datahike can add their own virtual catalog tables via
`register-catalog-table!` (e.g. Odoo's internal metadata tables,
a `pg_stat_activity`-style probe, etc.) without modifying this
namespace.

Two public entry points:
- `register-catalog-table!` / `unregister-catalog-table!`
  — the extension seam
- `extract-empty-catalog-shape` / `system-query?`
  — called by the wire handler to short-circuit common boot probes
    (pgjdbc's field-metadata, Hibernate's feature detection) into
    fast paths before JSqlParser even runs.
raw docstring

datahike.pg.sql.ctx

Translation context for the SQL → Datalog translator.

The ctx is an immutable map of per-translation atoms that every translate-* fn threads through. Atoms capture the side-effecting state (fresh-var counter, collected where clauses, entity-var bindings, prepared-statement placeholders, …) while the outer map stays shareable with nested sub-translations.

Fn inventory:

  • resolve-column / resolve-inherited-attr — map a JSqlParser Column reference to a Datahike attribute keyword, following the table-alias map and PostgreSQL INHERITS semantics.
  • make-ctx — build a fresh context.
  • fresh-var!, entity-var!, add-clause!, col-var! — primitives the translators call to allocate logic variables, assign entity bindings, append where clauses, and lazily produce get-else-backed column bindings.
  • materialize-arg! — bind a composite expression to a fresh var.
  • null-guard-clauses / make-columns-optional! — SQL 3-valued-logic helpers.
  • collect-vars — recursively gather ?v-style symbols out of a translated form.

Fns originally marked ^:private in sql.clj are promoted to public here so the extracted translate-* namespaces can reach them without re-exporting through the top-level sql ns.

Translation context for the SQL → Datalog translator.

The `ctx` is an immutable map of per-translation atoms that every
`translate-*` fn threads through. Atoms capture the side-effecting
state (fresh-var counter, collected where clauses, entity-var
bindings, prepared-statement placeholders, …) while the outer map
stays shareable with nested sub-translations.

Fn inventory:

- `resolve-column` / `resolve-inherited-attr` — map a JSqlParser
  `Column` reference to a Datahike attribute keyword, following the
  table-alias map and PostgreSQL INHERITS semantics.
- `make-ctx` — build a fresh context.
- `fresh-var!`, `entity-var!`, `add-clause!`, `col-var!` —
  primitives the translators call to allocate logic variables,
  assign entity bindings, append where clauses, and lazily
  produce `get-else`-backed column bindings.
- `materialize-arg!` — bind a composite expression to a fresh var.
- `null-guard-clauses` / `make-columns-optional!` — SQL
  3-valued-logic helpers.
- `collect-vars` — recursively gather `?v`-style symbols out of
  a translated form.

Fns originally marked `^:private` in sql.clj are promoted to
public here so the extracted translate-* namespaces can reach
them without re-exporting through the top-level sql ns.
raw docstring

datahike.pg.sql.ddl

DDL translation: CREATE TABLE, CREATE SEQUENCE, constraint extraction.

Produces Datahike tx-data that installs schema attributes (:db/ident, :db/valueType, :db/cardinality, :db/unique, :pg/type, and the pgwire-specific :pg/* constraint metadata).

The predicate helpers (identity-column?, column-is-primary-key?, column-is-unique?, column-is-not-null?, specs-contain-seq?, column-check-expr-text, column-default-spec) walk JSqlParser's ColumnDefinition.getColumnSpecs lists, which are raw strings — so each predicate is a case-insensitive token matcher.

Entry points:

  • translate-create-table — main; returns {:type :ddl :tx-data [...]}
  • translate-create-sequence — CREATE SEQUENCE name [...]

Helpers:

  • extract-ddl-constraints — CHECK / NOT NULL / UNIQUE / FK clauses
  • extract-inherits — PostgreSQL INHERITS
DDL translation: CREATE TABLE, CREATE SEQUENCE, constraint extraction.

Produces Datahike tx-data that installs schema attributes (`:db/ident`,
`:db/valueType`, `:db/cardinality`, `:db/unique`, `:pg/type`, and the
pgwire-specific `:pg/*` constraint metadata).

The predicate helpers (`identity-column?`, `column-is-primary-key?`,
`column-is-unique?`, `column-is-not-null?`, `specs-contain-seq?`,
`column-check-expr-text`, `column-default-spec`) walk JSqlParser's
`ColumnDefinition.getColumnSpecs` lists, which are raw strings — so
each predicate is a case-insensitive token matcher.

Entry points:
  - translate-create-table    — main; returns {:type :ddl :tx-data [...]}
  - translate-create-sequence — CREATE SEQUENCE name [...]

Helpers:
  - extract-ddl-constraints — CHECK / NOT NULL / UNIQUE / FK clauses
  - extract-inherits        — PostgreSQL INHERITS
raw docstring

datahike.pg.sql.expr

Expression + predicate translation.

Converts JSqlParser expression nodes (arithmetic, comparisons, CASE, CAST, function calls, IN / BETWEEN / EXISTS subqueries, JSON operators, …) into Datalog find-element expressions and where-clause predicates.

Two families of entry points:

  • Expression side (translate-expr): evaluates to a value or binds a var. Produces ?v symbols via materialization when the result feeds into a SELECT projection, ORDER BY key, aggregate argument, or outer SQL function call.

  • Predicate side (translate-predicate): produces a vec of Datalog where-clauses that filter rows. Recursively handles AND / OR / NOT, comparison ops with three-valued-logic null guards, IN-list / IN-subquery, BETWEEN, IS NULL, EXISTS and NOT EXISTS (including the correlated-subquery path with entity-var unification + not-join fallback), LIKE / ILIKE, JSON operators (@>, <@, ?, ?|, ?&), and regex match.

translate-expr and translate-predicate are mutually recursive through translate-case-expr (WHEN predicates) and translate-comparison (operand expressions). Everything lives in one namespace so the recursion is a regular in-ns declare and doesn't cross any ns boundary.

Back-edge to sql.clj: the EXISTS / IN-subquery branches need to recursively re-enter the parser for sub-SQL. Rather than import datahike.pg.sql (which would create a load cycle), we receive parse-sql through the ctx's :parse-sql slot — populated by datahike.pg.sql/parse-sql at top-level ctx construction and inherited by sub-ctxs (for CTE / correlated subquery / derived tables) when they copy from their outer ctx.

Every translate-* fn takes a ctx first-argument (built by datahike.pg.sql.ctx/make-ctx) and returns either a value, a fresh var, or a vector of where-clauses (for the predicate side). Side effects flow through the ctx's atoms (:where-clauses, :in-args, :param-placeholders, :entity-vars, :col->var).

Expression + predicate translation.

Converts JSqlParser expression nodes (arithmetic, comparisons, CASE,
CAST, function calls, IN / BETWEEN / EXISTS subqueries, JSON
operators, …) into Datalog find-element expressions and
where-clause predicates.

Two families of entry points:

- **Expression side** (`translate-expr`): evaluates to a value or
  binds a var. Produces `?v` symbols via materialization when the
  result feeds into a SELECT projection, ORDER BY key, aggregate
  argument, or outer SQL function call.

- **Predicate side** (`translate-predicate`): produces a vec of
  Datalog where-clauses that filter rows. Recursively handles
  AND / OR / NOT, comparison ops with three-valued-logic null
  guards, IN-list / IN-subquery, BETWEEN, IS NULL, EXISTS and
  NOT EXISTS (including the correlated-subquery path with
  entity-var unification + not-join fallback), LIKE / ILIKE,
  JSON operators (@>, <@, ?, ?|, ?&), and regex match.

`translate-expr` and `translate-predicate` are mutually recursive
through `translate-case-expr` (WHEN predicates) and
`translate-comparison` (operand expressions). Everything lives in
one namespace so the recursion is a regular in-ns `declare` and
doesn't cross any ns boundary.

Back-edge to sql.clj: the EXISTS / IN-subquery branches need to
recursively re-enter the parser for sub-SQL. Rather than import
`datahike.pg.sql` (which would create a load cycle), we receive
`parse-sql` through the ctx's `:parse-sql` slot — populated by
`datahike.pg.sql/parse-sql` at top-level ctx construction and
inherited by sub-ctxs (for CTE / correlated subquery / derived
tables) when they copy from their outer ctx.

Every translate-* fn takes a ctx first-argument (built by
`datahike.pg.sql.ctx/make-ctx`) and returns either a value, a
fresh var, or a vector of where-clauses (for the predicate side).
Side effects flow through the ctx's atoms (`:where-clauses`,
`:in-args`, `:param-placeholders`, `:entity-vars`, `:col->var`).
raw docstring

datahike.pg.sql.fns

SQL → Clojure function wrappers.

Every named function the translator can emit at query time lives here, behind a stable (fully-qualified) symbol that Datahike's resolve-fn looks up at execute time:

  • Aggregate wrappers (filter-sum / filter-avg / …) — variants of the core Datahike aggregates that skip the :__null__ sentinel and treat nil as SQL NULL. Needed because core's sum/avg throw on keyword arithmetic and min/max mis-order keyword vs number.

  • null-safe — wraps a scalar fn so either-operand-is-NULL returns NULL. Used to adapt Clojure string/math fns (upper, length, arithmetic) to SQL 3-valued logic.

  • SQL string fns (sql-lpad, sql-position, …) — thin wrappers with standard SQL semantics (1-based positions, padding behavior).

  • PG-catalog-function stubs (pg-table-is-visible, pg-format-type, pg-get-expr) — minimum viable to make JDBC drivers and ORMs that probe the catalog not trip over missing functions.

  • Lookup tables:

    • sql-aggregate->datalog — SQL agg name → fully-qualified symbol. Emitted symbol resolves via datahike.pg.sql/filter-* which re-exports from this ns.
    • sql-fn->clj-fn — SQL fn name → actual IFn value. Used by translate-function-call at emit time to wrap in null-safe.

Runtime symbols re-exported from datahike.pg.sql so the translator's emitted forms keep the old qualified path ('datahike.pg.sql/filter-count, 'datahike.pg.sql/sql-+, …) for backward compat with already-prepared statements cached on clients.

SQL → Clojure function wrappers.

Every named function the translator can emit at query time lives
here, behind a stable (fully-qualified) symbol that Datahike's
`resolve-fn` looks up at execute time:

- Aggregate wrappers (`filter-sum` / `filter-avg` / …) — variants of
  the core Datahike aggregates that skip the `:__null__` sentinel and
  treat nil as SQL NULL. Needed because core's `sum`/`avg` throw on
  keyword arithmetic and `min`/`max` mis-order keyword vs number.

- `null-safe` — wraps a scalar fn so either-operand-is-NULL returns
  NULL. Used to adapt Clojure string/math fns (upper, length,
  arithmetic) to SQL 3-valued logic.

- SQL string fns (`sql-lpad`, `sql-position`, …) — thin wrappers
  with standard SQL semantics (1-based positions, padding behavior).

- PG-catalog-function stubs (`pg-table-is-visible`, `pg-format-type`,
  `pg-get-expr`) — minimum viable to make JDBC drivers and ORMs
  that probe the catalog not trip over missing functions.

- Lookup tables:
    * `sql-aggregate->datalog` — SQL agg name → fully-qualified
      symbol. Emitted symbol resolves via
      `datahike.pg.sql/filter-*` which re-exports from this ns.
    * `sql-fn->clj-fn` — SQL fn name → actual IFn value. Used by
      `translate-function-call` at emit time to wrap in `null-safe`.

Runtime symbols re-exported from `datahike.pg.sql` so the
translator's emitted forms keep the old qualified path
(`'datahike.pg.sql/filter-count`, `'datahike.pg.sql/sql-+`, …)
for backward compat with already-prepared statements cached on
clients.
raw docstring

datahike.pg.sql.params

Prepared-statement parameter substitution and PG OID inference.

Three concerns live together here because they all deal with the ? / $N placeholder lifecycle:

  1. ParamRef record + param-ref?. Emitted by translators at Parse time inside tx-data / query structures, replaced by real values at Execute time via substitute-params.

  2. *bound-params* dynamic var: when bound to a 1-indexed vector of resolved values, translator branches (e.g. the JdbcParameter expression) resolve placeholders in-line instead of emitting ParamRef. This lets the same translator body serve both prepared-Parse (returns structured AST with ParamRefs) and re-translation-during-Execute (returns fully-bound form).

  3. OID inference (infer-param-oid-for-column, insert-param-oids, update-param-oids, where-param-oids) — walks a JSqlParser AST, maps each placeholder index to the PG type OID of the column it's bound against. Consumed by describeParams to populate the ParameterDescription message so pgjdbc / psycopg2 / etc. size their binary binds correctly.

*parse-db* is bound by parse-sql (higher-level) to the current db snapshot so OID inference can consult :pg/type metadata that (:schema db) doesn't surface.

Prepared-statement parameter substitution and PG OID inference.

Three concerns live together here because they all deal with the
`?` / `$N` placeholder lifecycle:

1. ParamRef record + `param-ref?`. Emitted by translators at Parse
   time inside tx-data / query structures, replaced by real values
   at Execute time via `substitute-params`.

2. `*bound-params*` dynamic var: when bound to a 1-indexed vector
   of resolved values, translator branches (e.g. the JdbcParameter
   expression) resolve placeholders in-line instead of emitting
   ParamRef. This lets the same translator body serve both
   prepared-Parse (returns structured AST with ParamRefs) and
   re-translation-during-Execute (returns fully-bound form).

3. OID inference (`infer-param-oid-for-column`,
   `insert-param-oids`, `update-param-oids`, `where-param-oids`) —
   walks a JSqlParser AST, maps each placeholder index to the PG
   type OID of the column it's bound against. Consumed by
   describeParams to populate the ParameterDescription message so
   pgjdbc / psycopg2 / etc. size their binary binds correctly.

`*parse-db*` is bound by parse-sql (higher-level) to the current
db snapshot so OID inference can consult :pg/type metadata that
(:schema db) doesn't surface.
raw docstring

datahike.pg.sql.stmt

Statement-level translation: SELECT / INSERT / UPDATE / DELETE / CTE.

The top half of this namespace (translate-select + its join / HAVING / materialization machinery) maps a PlainSelect AST to a Datalog query-map plus row-formatting metadata the handler uses at execute time.

The middle half handles INSERT / UPDATE / DELETE: extract-value lifts JSqlParser literal/expression nodes into Clojure values, coerce-insert-value adapts them to the target column's :db/valueType, and translate-insert / translate-update / translate-delete produce tx-data + (for UPDATE/DELETE) an eids-walk query. INSERT RETURNING and UPDATE RETURNING land in extract-returning; CHECK / UPDATE expressions evaluated per-row at handler time go through eval-check-predicate / eval-update-expr.

The bottom half implements CTEs: translate-cte-branch materializes one WITH-clause body against an enriched db so the outer SELECT / DML can reference it as a virtual table; translate-recursive-cte handles the WITH RECURSIVE form by iterating until the CTE converges.

The three blocks live in one namespace because they are mutually recursive:

translate-select → translate-recursive-cte (WITH in SELECT) translate-recursive-cte → translate-select (CTE body) translate-insert → translate-select (INSERT ... SELECT) translate-insert → translate-recursive-cte (INSERT ... WITH) translate-select → extract-value (scalar subqueries)

Dependencies on already-extracted namespaces are all one-way:

stmt → expr (translate-expr, translate-predicate, …) stmt → ctx (make-ctx, col-var!, resolve-column, …) stmt → fns (aggregate lookups) stmt → params (ParamRef, from-bindings, parse-db) stmt → jsonb, schema, types (type coercion + jsonb ops)

Statement-level translation: SELECT / INSERT / UPDATE / DELETE / CTE.

The top half of this namespace (translate-select + its join /
HAVING / materialization machinery) maps a PlainSelect AST to a
Datalog query-map plus row-formatting metadata the handler uses
at execute time.

The middle half handles INSERT / UPDATE / DELETE: `extract-value`
lifts JSqlParser literal/expression nodes into Clojure values,
`coerce-insert-value` adapts them to the target column's
:db/valueType, and `translate-insert` / `translate-update` /
`translate-delete` produce tx-data + (for UPDATE/DELETE) an
eids-walk query. INSERT RETURNING and UPDATE RETURNING land in
`extract-returning`; CHECK / UPDATE expressions evaluated
per-row at handler time go through `eval-check-predicate` /
`eval-update-expr`.

The bottom half implements CTEs: `translate-cte-branch`
materializes one WITH-clause body against an enriched db so the
outer SELECT / DML can reference it as a virtual table;
`translate-recursive-cte` handles the WITH RECURSIVE form by
iterating until the CTE converges.

The three blocks live in one namespace because they are mutually
recursive:

  translate-select → translate-recursive-cte        (WITH in SELECT)
  translate-recursive-cte → translate-select         (CTE body)
  translate-insert → translate-select                (INSERT ... SELECT)
  translate-insert → translate-recursive-cte         (INSERT ... WITH)
  translate-select → extract-value                   (scalar subqueries)

Dependencies on already-extracted namespaces are all one-way:

  stmt → expr (translate-expr, translate-predicate, …)
  stmt → ctx  (make-ctx, col-var!, resolve-column, …)
  stmt → fns  (aggregate lookups)
  stmt → params (ParamRef, *from-bindings*, *parse-db*)
  stmt → jsonb, schema, types   (type coercion + jsonb ops)
raw docstring

datahike.pg.types

PostgreSQL type system registry for the PgWire compatibility layer.

Centralizes all type mappings between PostgreSQL OIDs, SQL type names, Datahike value types, and wire protocol format codes.

Authoritative source: PostgreSQL 19devel src/include/catalog/pg_type.dat

Three directions of mapping:

  1. SQL name → Datahike type (for CREATE TABLE)
  2. Datahike type → PG OID (for wire protocol RowDescription)
  3. PG OID → SQL name (for format_type() and information_schema)
  4. SQL name → category (for CAST type classification)
PostgreSQL type system registry for the PgWire compatibility layer.

Centralizes all type mappings between PostgreSQL OIDs, SQL type names,
Datahike value types, and wire protocol format codes.

Authoritative source: PostgreSQL 19devel src/include/catalog/pg_type.dat

Three directions of mapping:
1. SQL name → Datahike type (for CREATE TABLE)
2. Datahike type → PG OID (for wire protocol RowDescription)
3. PG OID → SQL name (for format_type() and information_schema)
4. SQL name → category (for CAST type classification)
raw docstring

datahike.pg.window

Window function post-processing engine.

Operates on query result rows (vectors of values) after the main Datalog query has executed. Computes ROW_NUMBER, RANK, DENSE_RANK, NTILE, PERCENT_RANK, CUME_DIST, running SUM/AVG/COUNT/MIN/MAX, and LAG/LEAD.

Architecture: matches Stratum's window.clj interface (same spec maps) but uses row-based processing instead of columnar typed arrays.

Window spec format: {:op :row-number/:rank/:dense-rank/:ntile/:sum/:avg/:count/:min/:max/:lag/:lead :col-idx int — column index for aggregate/offset ops (nil for ranking) :partition-by [idx ...] — column indices for partitioning :order-by [[idx :asc/:desc] ...] — column indices + direction for sorting :frame {:type :rows/:range :start bound :end bound} :offset int — LAG/LEAD offset (default 1) :default val — LAG/LEAD default value :ntile-n int} — NTILE bucket count

Window function post-processing engine.

Operates on query result rows (vectors of values) after the main Datalog
query has executed. Computes ROW_NUMBER, RANK, DENSE_RANK, NTILE,
PERCENT_RANK, CUME_DIST, running SUM/AVG/COUNT/MIN/MAX, and LAG/LEAD.

Architecture: matches Stratum's window.clj interface (same spec maps)
but uses row-based processing instead of columnar typed arrays.

Window spec format:
  {:op :row-number/:rank/:dense-rank/:ntile/:sum/:avg/:count/:min/:max/:lag/:lead
   :col-idx int          — column index for aggregate/offset ops (nil for ranking)
   :partition-by [idx ...] — column indices for partitioning
   :order-by [[idx :asc/:desc] ...] — column indices + direction for sorting
   :frame {:type :rows/:range :start bound :end bound}
   :offset int           — LAG/LEAD offset (default 1)
   :default val          — LAG/LEAD default value
   :ntile-n int}         — NTILE bucket count
raw docstring

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts
Ctrl+kJump to recent docs
Move to previous article
Move to next article
Ctrl+/Jump to the search field
× close