Liking cljdoc? Tell your friends :D

datahike.pg

Public API for pg-datahike — PostgreSQL access for Datahike.

pg-datahike embeds a PG-compatible adapter inside a Datahike process: wire protocol, SQL translator, virtual pg_* / information_schema catalogs, constraint enforcement, schema hints. Clients that speak PG (pgjdbc, psql, psycopg2, Rails AR, Hibernate, Odoo, Flyway, …) talk to a Datahike database without a PG install. This namespace is the stable surface; datahike.pg.* sub-namespaces hold implementation details that may be reorganized without warning.

Starting a server

Single database:

(require '[datahike.pg :as pg] '[datahike.api :as d])

(d/create-database {:store {:backend :mem :id "demo"} :schema-flexibility :write}) (def conn (d/connect {:store {:backend :mem :id "demo"}}))

(def srv (pg/start-server conn {:port 5432})) ;; … clients connect to localhost:5432 … (pg/stop-server srv)

Multiple databases served from one server — clients route via the JDBC URL's database name (jdbc:postgresql://host:5432/prod):

(def srv (pg/start-server {"prod" prod-conn "staging" staging-conn} {:port 5432}))

Unknown database names are rejected with SQLSTATE 3D000.

Handlers are created per incoming connection by the server. For direct in-process use (tests, REPL), bypass the wire layer:

(def h (pg/make-query-handler conn)) (.execute h "SELECT 1")

Schema hints for native Datahike databases

A pre-existing Datahike database works immediately — attributes in the schema become virtual tables. Hints let you customize that view without changing the underlying schema:

(pg/set-hint! conn :person/full_name {:column "name"}) (pg/set-hint! conn :person/ssn {:hidden true}) (pg/set-hint! conn :person/company {:references :company/id})

  • :column — rename the SQL column
  • :hidden — exclude the attr from SELECT * / information_schema
  • :references — FK target for :db.type/ref attrs (makes JOIN c ON p.fk = c.pk work without db_id)
  • :table — rename the SQL table for the attr's namespace

Compat / opt-out

The handler rejects unsupported DDL (GRANT, REVOKE, POLICY, ROW LEVEL SECURITY, CREATE EXTENSION, COPY) by default with SQLSTATE 0A000. For apps that can't be retrofitted to avoid emitting those — most ORMs — accept them silently per-feature or by named preset:

(pg/make-query-handler conn {:compat :permissive}) (pg/make-query-handler conn {:silently-accept #{:grant :policy}})

See datahike.pg.server/compat-presets for the preset bundles.

Temporal queries

Datahike's as-of / since / history are exposed through SET:

SET datahike.as_of = '2024-01-15T00:00:00Z'; SET datahike.since = '2024-01-01T00:00:00Z'; SET datahike.history = 'true'; RESET datahike.as_of;

Extending catalog surfaces

Libraries that expose app-level metadata (audit tables, health checks, per-tenant views) can register additional virtual catalog tables. Entries are {:schema [Datahike-schema entries including a row-marker attr] :data-fn (fn [user-schema cte-db] rows)}.

(pg/register-catalog-table! "app_metrics" {:schema [{:db/ident :app_metrics/key …} {:db/ident :app_metrics/val …} {:db/ident (datahike.pg.schema/row-marker-attr "app_metrics") …}] :data-fn (fn [_us _db] (current-metric-rows))})

The table is then queryable via SQL: SELECT * FROM app_metrics.

Public API for pg-datahike — PostgreSQL access for Datahike.

pg-datahike embeds a PG-compatible adapter inside a Datahike process:
wire protocol, SQL translator, virtual pg_* / information_schema
catalogs, constraint enforcement, schema hints. Clients that speak PG
(pgjdbc, psql, psycopg2, Rails AR, Hibernate, Odoo, Flyway, …) talk
to a Datahike database without a PG install. This namespace is the
stable surface; `datahike.pg.*` sub-namespaces hold implementation
details that may be reorganized without warning.

## Starting a server

Single database:

  (require '[datahike.pg :as pg]
           '[datahike.api :as d])

  (d/create-database {:store {:backend :mem :id "demo"}
                      :schema-flexibility :write})
  (def conn (d/connect {:store {:backend :mem :id "demo"}}))

  (def srv (pg/start-server conn {:port 5432}))
  ;; … clients connect to localhost:5432 …
  (pg/stop-server srv)

Multiple databases served from one server — clients route via the
JDBC URL's database name (jdbc:postgresql://host:5432/prod):

  (def srv (pg/start-server {"prod"    prod-conn
                             "staging" staging-conn}
                            {:port 5432}))

Unknown database names are rejected with SQLSTATE 3D000.

Handlers are created per incoming connection by the server. For
direct in-process use (tests, REPL), bypass the wire layer:

  (def h (pg/make-query-handler conn))
  (.execute h "SELECT 1")

## Schema hints for native Datahike databases

A pre-existing Datahike database works immediately — attributes in
the schema become virtual tables. Hints let you customize that view
without changing the underlying schema:

  (pg/set-hint! conn :person/full_name {:column "name"})
  (pg/set-hint! conn :person/ssn       {:hidden true})
  (pg/set-hint! conn :person/company   {:references :company/id})

- `:column`     — rename the SQL column
- `:hidden`     — exclude the attr from SELECT * / information_schema
- `:references` — FK target for `:db.type/ref` attrs (makes
                  `JOIN c ON p.fk = c.pk` work without `db_id`)
- `:table`      — rename the SQL table for the attr's namespace

## Compat / opt-out

The handler rejects unsupported DDL (GRANT, REVOKE, POLICY,
ROW LEVEL SECURITY, CREATE EXTENSION, COPY) by default with
SQLSTATE 0A000. For apps that can't be retrofitted to avoid
emitting those — most ORMs — accept them silently per-feature or
by named preset:

  (pg/make-query-handler conn {:compat :permissive})
  (pg/make-query-handler conn {:silently-accept #{:grant :policy}})

See `datahike.pg.server/compat-presets` for the preset bundles.

## Temporal queries

Datahike's `as-of` / `since` / `history` are exposed through SET:

  SET datahike.as_of   = '2024-01-15T00:00:00Z';
  SET datahike.since   = '2024-01-01T00:00:00Z';
  SET datahike.history = 'true';
  RESET datahike.as_of;

## Extending catalog surfaces

Libraries that expose app-level metadata (audit tables, health
checks, per-tenant views) can register additional virtual catalog
tables. Entries are {:schema [Datahike-schema entries including a
row-marker attr] :data-fn (fn [user-schema cte-db] rows)}.

  (pg/register-catalog-table!
    "app_metrics"
    {:schema [{:db/ident :app_metrics/key …}
              {:db/ident :app_metrics/val …}
              {:db/ident (datahike.pg.schema/row-marker-attr
                          "app_metrics") …}]
     :data-fn (fn [_us _db] (current-metric-rows))})

The table is then queryable via SQL: `SELECT * FROM app_metrics`.
raw docstring

datahike.pg.arrays

First-class array values for pgwire-datahike.

Mirrors PostgreSQL's ArrayType (src/include/utils/array.h): a value carrying element-type, the element vector (possibly nested for multi-dim), per-dimension sizes, and per-dimension lower bounds. Subscripting, slicing, membership, containment, concat, PG text-format codec.

Design notes:

  • elem-type is a keyword matching our types/oid-* naming (:int8, :text, :name, :bool, :float8, …) — not an OID. Resolution to OID happens at the oid-infer / describeResult layer.
  • elements is a Clojure vector. For 1-D arrays it's a flat vector of element values (nil represents SQL NULL). For multi-dim it's a nested vector — outermost level is the first dimension, innermost is the last. Always a uniform shape; PG rejects ragged arrays and so do we (the parser raises on a dimension mismatch).
  • dims is the vector of per-dimension sizes. nil is shorthand for [(count elements)] — a 1-D array. For multi-dim, (count dims) = ndim, (reduce * dims) = total leaf count.
  • lbounds is the vector of per-dimension lower bounds. nil is shorthand for [1, 1, …]. PG defaults to 1; non-1 lbounds come from ARRAY[lo:hi]=… literals (rare). Preserved through to-pg-text/from-pg-text round-trip when non-default.
  • We deliberately do NOT use the :__null__ sentinel inside arrays. Arrays are values we own end-to-end, so element NULLs are Clojure nil.
  • Subscript semantics follow PG exactly: 1-indexed (offset by lbound), out-of-range returns nil (SQL NULL), NOT an error. arr[0] and arr[-1] return nil for the default lbound=1.
  • defrecord gives us structural equality/hash — important for datahike/datalog round-trip (arrays flow through query results, bind params, and DISTINCT/GROUP BY without special handling).

Text codec follows PG (src/backend/utils/adt/arrayfuncs.c: array_out / array_in):

  • {} for empty
  • Elements joined by ,; nested dimensions emit nested {…}
  • Quote "…" when element contains ,, ", \, {, }, or whitespace, or is empty, or is the literal NULL (case-insensitive)
  • Inside quotes: escape \ and " with backslash
  • NULL elements emit unquoted NULL token
  • Booleans as t/f (PG text format for BOOL)
  • Non-default lbounds emit a [lo1:hi1][lo2:hi2]…= prefix
First-class array values for pgwire-datahike.

Mirrors PostgreSQL's `ArrayType` (src/include/utils/array.h): a
value carrying element-type, the element vector (possibly nested
for multi-dim), per-dimension sizes, and per-dimension lower
bounds. Subscripting, slicing, membership, containment, concat,
PG text-format codec.

Design notes:
- `elem-type` is a keyword matching our types/oid-* naming
  (`:int8`, `:text`, `:name`, `:bool`, `:float8`, …) — not an OID.
  Resolution to OID happens at the oid-infer / describeResult layer.
- `elements` is a Clojure vector. For 1-D arrays it's a flat
  vector of element values (`nil` represents SQL NULL). For
  multi-dim it's a nested vector — outermost level is the first
  dimension, innermost is the last. Always a uniform shape; PG
  rejects ragged arrays and so do we (the parser raises on a
  dimension mismatch).
- `dims` is the vector of per-dimension sizes. `nil` is shorthand
  for `[(count elements)]` — a 1-D array. For multi-dim,
  `(count dims) = ndim`, `(reduce * dims) = total leaf count`.
- `lbounds` is the vector of per-dimension lower bounds. `nil` is
  shorthand for `[1, 1, …]`. PG defaults to 1; non-1 lbounds come
  from `ARRAY[lo:hi]=…` literals (rare). Preserved through
  to-pg-text/from-pg-text round-trip when non-default.
- We deliberately do NOT use the `:__null__` sentinel inside
  arrays. Arrays are values we own end-to-end, so element NULLs
  are Clojure `nil`.
- Subscript semantics follow PG exactly: 1-indexed (offset by
  lbound), out-of-range returns `nil` (SQL NULL), NOT an error.
  `arr[0]` and `arr[-1]` return nil for the default lbound=1.
- `defrecord` gives us structural equality/hash — important for
  datahike/datalog round-trip (arrays flow through query results,
  bind params, and DISTINCT/GROUP BY without special handling).

Text codec follows PG (`src/backend/utils/adt/arrayfuncs.c`:
`array_out` / `array_in`):
- `{}` for empty
- Elements joined by `,`; nested dimensions emit nested `{…}`
- Quote `"…"` when element contains `,`, `"`, `\`, `{`, `}`, or
  whitespace, or is empty, or is the literal `NULL` (case-insensitive)
- Inside quotes: escape `\` and `"` with backslash
- NULL elements emit unquoted `NULL` token
- Booleans as `t`/`f` (PG text format for BOOL)
- Non-default lbounds emit a `[lo1:hi1][lo2:hi2]…=` prefix
raw docstring

datahike.pg.dump

Dump a pg-datahike database to portable PostgreSQL SQL.

This is the inverse of our pg_dump --inserts | psql import path. It walks a Datahike connection's schema, reverse-engineers the SQL DDL we originally translated from, and emits a sequence of SQL strings that can be replayed against either pg-datahike OR a real PostgreSQL server.

Output layout (mirrors pg_dump's):

-- Header SET client_encoding = 'UTF8';

-- DDL: CREATE TABLE for every table + CREATE SEQUENCE for -- every named sequence. PRIMARY KEY / UNIQUE are inline; FKs -- are deferred to the bottom so data load order doesn't matter. CREATE TABLE "customer" (...); CREATE TABLE "order" (...); CREATE SEQUENCE "customer_id_seq";

-- Data COPY "customer" (...) FROM stdin; ... .

-- FKs (or any other ALTER TABLE ADD CONSTRAINT) — last, so -- forward / cyclic references load cleanly. ALTER TABLE "order" ADD CONSTRAINT order_customer_id_fkey FOREIGN KEY (customer_id) REFERENCES customer (id);

-- Sequence current values SELECT pg_catalog.setval('customer_id_seq', 42, true);

Two output shapes via :format: :inserts (default) — INSERT INTO ... VALUES (...); per row. More universally re-loadable; works against tier-1 of pg-datahike's own import path. :copy — COPY ... FROM stdin (text format) per row. More compact; requires the loader to drive the COPY-IN sub-protocol (psql does this automatically via \copy / pg_dump's stdin path).

What we capture today:

  • column type (driven by :db/valueType + :pg/type override)
  • PRIMARY KEY (from :db.unique/identity)
  • UNIQUE (from :db.unique/value)
  • NOT NULL (from :pg/not-null annotation entity)
  • DEFAULT (from :pg/default-kind + :pg/default-value)
  • FOREIGN KEY (from :datahike.pg/references hint)
  • CHECK (from :pg/check-* entities)
  • sequences (from :seq/* entities)
  • all data rows (filtered by the :<table>/db-row-exists marker)

What we DON'T capture (yet):

  • VIEW / MATERIALIZED VIEW definitions
  • INDEXes (we don't materialize them; PG would create them fresh on the load side)
  • TRIGGERs / functions (we don't have these)
  • COMMENT ON
  • GRANT / REVOKE / OWNER (no role system)

Public API: (dump conn) ; lazy seq of SQL strings (dump conn {:format :copy}) (dump conn {:sections #{:schema}}) ; or #{:data}, default :all (dump conn {:exclude-tables #{"x"}}) ; skip these table namespaces

Dump a pg-datahike database to portable PostgreSQL SQL.

This is the inverse of our `pg_dump --inserts | psql` import path.
It walks a Datahike connection's schema, reverse-engineers the SQL
DDL we originally translated from, and emits a sequence of SQL
strings that can be replayed against either pg-datahike OR a real
PostgreSQL server.

Output layout (mirrors `pg_dump`'s):

  -- Header
  SET client_encoding = 'UTF8';

  -- DDL: CREATE TABLE for every table + CREATE SEQUENCE for
  -- every named sequence. PRIMARY KEY / UNIQUE are inline; FKs
  -- are deferred to the bottom so data load order doesn't matter.
  CREATE TABLE "customer" (...);
  CREATE TABLE "order" (...);
  CREATE SEQUENCE "customer_id_seq";

  -- Data
  COPY "customer" (...) FROM stdin;
  ...
  \.

  -- FKs (or any other ALTER TABLE ADD CONSTRAINT) — last, so
  -- forward / cyclic references load cleanly.
  ALTER TABLE "order" ADD CONSTRAINT order_customer_id_fkey
    FOREIGN KEY (customer_id) REFERENCES customer (id);

  -- Sequence current values
  SELECT pg_catalog.setval('customer_id_seq', 42, true);

Two output shapes via :format:
  :inserts (default) — INSERT INTO ... VALUES (...); per row.
                       More universally re-loadable; works against
                       tier-1 of pg-datahike's own import path.
  :copy              — COPY ... FROM stdin (text format) per row.
                       More compact; requires the loader to drive
                       the COPY-IN sub-protocol (psql does this
                       automatically via \copy / pg_dump's stdin
                       path).

What we capture today:
  - column type (driven by :db/valueType + :pg/type override)
  - PRIMARY KEY (from :db.unique/identity)
  - UNIQUE (from :db.unique/value)
  - NOT NULL (from :pg/not-null annotation entity)
  - DEFAULT (from :pg/default-kind + :pg/default-value)
  - FOREIGN KEY (from :datahike.pg/references hint)
  - CHECK (from :pg/check-* entities)
  - sequences (from :__seq__/* entities)
  - all data rows (filtered by the :<table>/db-row-exists marker)

What we DON'T capture (yet):
  - VIEW / MATERIALIZED VIEW definitions
  - INDEXes (we don't materialize them; PG would create them
    fresh on the load side)
  - TRIGGERs / functions (we don't have these)
  - COMMENT ON
  - GRANT / REVOKE / OWNER (no role system)

Public API:
  (dump conn)                            ; lazy seq of SQL strings
  (dump conn {:format :copy})
  (dump conn {:sections #{:schema}})     ; or #{:data}, default :all
  (dump conn {:exclude-tables #{"x"}})  ; skip these table namespaces
raw docstring

datahike.pg.errors

Exception → PostgreSQL ErrorResponse classification for the pgwire layer. Owns three things:

  1. Mapping a Throwable → SQLSTATE code (the wire ABI clients branch on).
  2. Producing a PG-shaped user-facing message (don't leak Datahike vocabulary like :foo/bar, {:db/id 47, …}).
  3. Populating the ErrorResponse fields (n / t / c / d / D / H) that ORMs read via getServerErrorMessage() / Diagnostics.constraint_name.

Throw sites within datahike.pg.* describe errors structurally:

(throw (ex-info "<short internal description>" {:error :undefined-column :table "employee" :column "dept_id"}))

The wire boundary (classify-exception) then derives:

  • SQLSTATE via error-categories
  • message via the category's :format fn (falls back to the throw site's own message when nil)
  • fields via extract-error-fields

Why centralised formatting

PG's own backend works the same way: throw sites call ereport(ERROR, errcode(ERRCODE_UNDEFINED_COLUMN), errmsg(...)) with structured args; the wire layer (pqcomm.c) emits the protocol message from the resulting ErrorData struct. There is no report_undefined_column() helper at every throw site. That's the pattern this namespace implements.

Lookup order in classify-exception

  1. Explicit :sqlstate in ex-data (override; bypasses formatter).
  2. :error key in error-categories
    • SQLSTATE from the registry
    • message from the entry's :format fn (or fallback)
  3. :datahike/canceled in ex-data → "57014" (cancelled).
  4. Datahike-emitted message regex (classify-message) — a last-resort safety net for unstructured errors thrown by Datahike core that we don't own. New pgwire throws should NOT rely on this.
  5. Fallback: "XX000" (internal_error).

Categories

Pgwire-side throw categories live in error-categories. Each entry has a :sqlstate and an optional :format fn. Datahike-internal :error keys map directly to SQLSTATEs (no formatter — Datahike's own message text comes through, with the SQLAlchemy-class missing-attribute case rewritten to PG vocabulary).

Canonical PG code list: postgres/src/backend/utils/errcodes.txt.

Exception → PostgreSQL ErrorResponse classification for the pgwire
layer. Owns three things:

  1. Mapping a Throwable → SQLSTATE code (the wire ABI clients
     branch on).
  2. Producing a PG-shaped user-facing message (don't leak
     Datahike vocabulary like `:foo/bar`, `{:db/id 47, …}`).
  3. Populating the ErrorResponse fields (n / t / c / d / D / H)
     that ORMs read via `getServerErrorMessage()` /
     `Diagnostics.constraint_name`.

Throw sites within `datahike.pg.*` describe errors structurally:

  (throw (ex-info "<short internal description>"
                  {:error :undefined-column
                   :table "employee"
                   :column "dept_id"}))

The wire boundary (`classify-exception`) then derives:
  - SQLSTATE  via `error-categories`
  - message   via the category's `:format` fn (falls back to the
              throw site's own message when nil)
  - fields    via `extract-error-fields`

## Why centralised formatting

PG's own backend works the same way: throw sites call
`ereport(ERROR, errcode(ERRCODE_UNDEFINED_COLUMN), errmsg(...))`
with structured args; the wire layer (`pqcomm.c`) emits the protocol
message from the resulting `ErrorData` struct. There is no
`report_undefined_column()` helper at every throw site. That's the
pattern this namespace implements.

## Lookup order in classify-exception

  1. Explicit `:sqlstate` in ex-data (override; bypasses formatter).
  2. `:error` key in `error-categories` →
       - SQLSTATE from the registry
       - message from the entry's `:format` fn (or fallback)
  3. `:datahike/canceled` in ex-data → "57014" (cancelled).
  4. Datahike-emitted message regex (`classify-message`) — a
     last-resort safety net for unstructured errors thrown by
     Datahike core that we don't own. New pgwire throws should NOT
     rely on this.
  5. Fallback: "XX000" (internal_error).

## Categories

Pgwire-side throw categories live in `error-categories`. Each entry
has a `:sqlstate` and an optional `:format` fn. Datahike-internal
`:error` keys map directly to SQLSTATEs (no formatter — Datahike's
own message text comes through, with the SQLAlchemy-class
missing-attribute case rewritten to PG vocabulary).

Canonical PG code list: postgres/src/backend/utils/errcodes.txt.
raw docstring

datahike.pg.jsonb

PostgreSQL jsonb type support for the PgWire compatibility layer.

Stores jsonb values as Clojure data structures (maps, vectors, strings, numbers, booleans, nil) serialized to/from JSON strings via jsonista.

Implements PostgreSQL jsonb operators and functions as pure Clojure functions that can be used as Datalog predicates or post-processing.

PostgreSQL jsonb type support for the PgWire compatibility layer.

Stores jsonb values as Clojure data structures (maps, vectors, strings,
numbers, booleans, nil) serialized to/from JSON strings via jsonista.

Implements PostgreSQL jsonb operators and functions as pure Clojure
functions that can be used as Datalog predicates or post-processing.
raw docstring

datahike.pg.main

Standalone CLI entrypoint for pg-datahike. Two subcommands:

serve (default) — boot a pgwire server. See serve --help. dump — dump a pg-datahike database as portable PostgreSQL SQL. Output replays into either pg-datahike or real PostgreSQL via psql. See dump --help.

Usage: java -jar pg-datahike-VERSION-standalone.jar [serve] [SERVE-OPTS] java -jar pg-datahike-VERSION-standalone.jar dump [DUMP-OPTS]

The default subcommand is serve (so the existing java -jar pg-datahike.jar --port 5432 invocation keeps working without explicitly typing serve).

Standalone CLI entrypoint for pg-datahike. Two subcommands:

  serve  (default) — boot a pgwire server. See `serve --help`.
  dump   — dump a pg-datahike database as portable PostgreSQL
           SQL. Output replays into either pg-datahike or real
           PostgreSQL via psql. See `dump --help`.

Usage:
  java -jar pg-datahike-VERSION-standalone.jar [serve] [SERVE-OPTS]
  java -jar pg-datahike-VERSION-standalone.jar dump [DUMP-OPTS]

The default subcommand is `serve` (so the existing
`java -jar pg-datahike.jar --port 5432` invocation keeps working
without explicitly typing `serve`).
raw docstring

datahike.pg.schema

Derives virtual PostgreSQL table definitions from Datahike schemas.

Maps attribute namespace prefixes to table names and attribute local names to column names. For example: :person/name → table 'person', column 'name' :person/age → table 'person', column 'age'

Every virtual table gets an implicit 'db_id' column (the entity ID).

Derives virtual PostgreSQL table definitions from Datahike schemas.

Maps attribute namespace prefixes to table names and attribute local names
to column names. For example:
  :person/name  → table 'person', column 'name'
  :person/age   → table 'person', column 'age'

Every virtual table gets an implicit 'db_id' column (the entity ID).
raw docstring

datahike.pg.server

PostgreSQL wire protocol server for Datahike.

Starts a pgwire-compatible server that accepts SQL queries from standard PostgreSQL clients (psql, DBeaver, JDBC, Python/psycopg2, etc.) and translates them to Datahike Datalog queries.

Usage: (require '[datahike.api :as d]) (require '[datahike.pg.server :as pg])

(def conn (d/connect cfg)) (def server (pg/start-server conn {:port 5432})) ;; ... use any PostgreSQL client ... (pg/stop-server server)

PostgreSQL wire protocol server for Datahike.

Starts a pgwire-compatible server that accepts SQL queries from standard
PostgreSQL clients (psql, DBeaver, JDBC, Python/psycopg2, etc.) and
translates them to Datahike Datalog queries.

Usage:
  (require '[datahike.api :as d])
  (require '[datahike.pg.server :as pg])

  (def conn (d/connect cfg))
  (def server (pg/start-server conn {:port 5432}))
  ;; ... use any PostgreSQL client ...
  (pg/stop-server server)
raw docstring

datahike.pg.sql

SQL → Datahike Datalog translator.

Parses SQL strings using JSqlParser and translates the AST into Datahike Datalog queries that can be executed by datahike.api/q.

The core mapping: attribute namespace prefixes become virtual table names. :person/name → table 'person', column 'name' :person/age → table 'person', column 'age'

Main entry points: (parse-sql sql schema) → {:type :select :query {...} :args [...]} | {:type :insert :tx-data [...]} | {:type :system :result QueryResult} | {:type :error :message str}

SQL → Datahike Datalog translator.

Parses SQL strings using JSqlParser and translates the AST into Datahike
Datalog queries that can be executed by `datahike.api/q`.

The core mapping: attribute namespace prefixes become virtual table names.
  :person/name  → table 'person', column 'name'
  :person/age   → table 'person', column 'age'

Main entry points:
  (parse-sql sql schema)  → {:type :select :query {...} :args [...]}
                           | {:type :insert :tx-data [...]}
                           | {:type :system :result QueryResult}
                           | {:type :error :message str}
raw docstring

datahike.pg.sql.catalog

Virtual PostgreSQL catalog materialization + system-query routing.

Datahike has no real pg_catalog.* or information_schema.* tables — pgwire synthesizes them on demand from the user schema. Every catalog table the pg_* ecosystem expects is a {:schema [...] :data-fn (fn [user-schema cte-db] ...)} entry that builds its row set lazily when a query references it.

The registry is extensible at runtime: libraries built on top of pg-datahike can add their own virtual catalog tables via register-catalog-table! (e.g. Odoo's internal metadata tables, a pg_stat_activity-style probe, etc.) without modifying this namespace.

Two public entry points:

  • register-catalog-table! / unregister-catalog-table! — the extension seam
  • extract-empty-catalog-shape / system-query? — called by the wire handler to short-circuit common boot probes (pgjdbc's field-metadata, Hibernate's feature detection) into fast paths before JSqlParser even runs.
Virtual PostgreSQL catalog materialization + system-query routing.

Datahike has no real `pg_catalog.*` or `information_schema.*` tables —
pgwire synthesizes them on demand from the user schema. Every
catalog table the pg_* ecosystem expects is a `{:schema [...]
:data-fn (fn [user-schema cte-db] ...)}` entry that builds its row
set lazily when a query references it.

The registry is extensible at runtime: libraries built on top of
pg-datahike can add their own virtual catalog tables via
`register-catalog-table!` (e.g. Odoo's internal metadata tables,
a `pg_stat_activity`-style probe, etc.) without modifying this
namespace.

Two public entry points:
- `register-catalog-table!` / `unregister-catalog-table!`
  — the extension seam
- `extract-empty-catalog-shape` / `system-query?`
  — called by the wire handler to short-circuit common boot probes
    (pgjdbc's field-metadata, Hibernate's feature detection) into
    fast paths before JSqlParser even runs.
raw docstring

datahike.pg.sql.classify

Structural SQL classifier — routes statements to the right handler before JSqlParser sees them.

Kills the regex sprawl across system-query?, parse-sql's CT6 guard, and the arg-extraction regexes (savepoint names, advisory- lock keys, temporal SET values) by feeding those sites a real tokenizer.

The tokenizer handles enough of PG lexical syntax to classify correctly in the face of keyword-inside-a-string, keyword-inside- a-comment, dollar-quoted strings, and case mix. It does NOT parse expressions — it yields a flat token stream for a keyword-dispatch classifier.

Classifier output: {:kind <statement-kind keyword> :name <savepoint/variable name if relevant> :args <vector of literal args — advisory keys, pg_sleep dur> :var <SET/RESET/SHOW variable name> :value <SET value (string) when captured> :reject-kind <opt-out knob key> :tag <synthetic command tag on silent-accept>}

:kind :generic-sql means 'pass to JSqlParser unchanged'.

Non-goals: full PG lexer, expression parsing. The token-driven preprocess-sql rewriter lives in datahike.pg.sql.rewrite and consumes this tokenizer's output.

Structural SQL classifier — routes statements to the right handler
before JSqlParser sees them.

Kills the regex sprawl across `system-query?`, `parse-sql`'s CT6
guard, and the arg-extraction regexes (savepoint names, advisory-
lock keys, temporal SET values) by feeding those sites a real
tokenizer.

The tokenizer handles enough of PG lexical syntax to classify
correctly in the face of keyword-inside-a-string, keyword-inside-
a-comment, dollar-quoted strings, and case mix. It does NOT parse
expressions — it yields a flat token stream for a keyword-dispatch
classifier.

Classifier output:
  {:kind        <statement-kind keyword>
   :name        <savepoint/variable name if relevant>
   :args        <vector of literal args — advisory keys, pg_sleep dur>
   :var         <SET/RESET/SHOW variable name>
   :value       <SET value (string) when captured>
   :reject-kind <opt-out knob key>
   :tag         <synthetic command tag on silent-accept>}

:kind :generic-sql means 'pass to JSqlParser unchanged'.

Non-goals: full PG lexer, expression parsing. The token-driven
preprocess-sql rewriter lives in datahike.pg.sql.rewrite and consumes
this tokenizer's output.
raw docstring

datahike.pg.sql.coerce

Numeric coercion helpers for SQL value paths (CAST + INSERT/UPDATE).

Both apply-sql-cast (in stmt.clj) and coerce-insert-value (also stmt.clj) used to inline Long/parseLong / Double/parseDouble / BigDecimal. with subtly different null/blank/overflow rules:

  • apply-sql-cast threw NumberFormatException on parse failure and .longValue on overflow (silently truncating).
  • coerce-insert-value's BigInteger branch silently truncated to long via Number.longValue.longValue of 2^63 returns Long/MIN_VALUE, which is a wrong-value bug, not a parse bug.
  • The bigdec / float / double string branches each had their own (try … (catch NumberFormatException _ val)) returning the original string on failure, which Datahike then rejected downstream with a generic schema error instead of 22P02.

This namespace centralises those rules so every numeric write goes through helpers that raise the right SQLSTATE:

  • 22003 numeric_value_out_of_range when a value can't fit the target's range.
  • 22P02 invalid_text_representation when a string can't be parsed as the target type.

Both errors are encoded as ex-info with :sqlstate; the wire layer's handler.clj already lifts those into ErrorResponse messages.

Numeric coercion helpers for SQL value paths (CAST + INSERT/UPDATE).

Both `apply-sql-cast` (in stmt.clj) and `coerce-insert-value`
(also stmt.clj) used to inline `Long/parseLong` /
`Double/parseDouble` / `BigDecimal.` with subtly different
null/blank/overflow rules:

  * apply-sql-cast threw `NumberFormatException` on parse failure
    and `.longValue` on overflow (silently truncating).
  * coerce-insert-value's BigInteger branch silently truncated
    to long via `Number.longValue` — `.longValue` of `2^63` returns
    `Long/MIN_VALUE`, which is a wrong-value bug, not a parse bug.
  * The bigdec / float / double string branches each had their
    own `(try … (catch NumberFormatException _ val))` returning
    the original string on failure, which Datahike then rejected
    downstream with a generic schema error instead of `22P02`.

This namespace centralises those rules so every numeric write goes
through helpers that raise the right SQLSTATE:

  * `22003 numeric_value_out_of_range` when a value can't fit
    the target's range.
  * `22P02 invalid_text_representation` when a string can't be
    parsed as the target type.

Both errors are encoded as `ex-info` with `:sqlstate`; the wire
layer's `handler.clj` already lifts those into ErrorResponse
messages.
raw docstring

datahike.pg.sql.copy

Token-driven hand-parser for COPY ... FROM STDIN (and COPY ... TO STDOUT, deferred). JSqlParser 5.x doesn't recognise COPY at all — UnsupportedStatement — so the wire-protocol layer needs structured access before it can drive the COPY-IN sub-protocol.

Also exposes row->entity-map: shared helper used by the COPY-IN exec handler (server.clj) to turn a vector of decoded String|::null values into a Datahike entity map keyed by :<ns>/<col> attributes, with per-column string→type coercion driven by :db/valueType.

Mirrors the structure of datahike.pg.sql.database: tokenise, parse the prefix (table + optional column list + FROM/TO target), then parse the option list in either:

  • Modern paren formWITH (key [=] value [, ...])
  • Legacy keyword form[WITH] kw1 kw2 ... (e.g. WITH BINARY, WITH CSV HEADER, DELIMITER '|' NULL '\N' CSV) still in the wild from old pg_dump output.

PG syntax (from ../postgres/doc/src/sgml/ref/copy.sgml):

COPY [schema.]table [ ( col [, ...] ) ] FROM { 'file' | PROGRAM 'cmd' | STDIN } [ [ WITH ] ( option [, ...] ) ]

Options accepted:

FORMAT 'text' | 'csv' | 'binary' DELIMITER 'X' — single byte NULL 'X' — null marker HEADER BOOL | MATCH — 1st row treatment QUOTE 'X' — CSV quote char ESCAPE 'X' — CSV escape char (defaults to QUOTE) FORCE_NOT_NULL ( col, ... ) | * FORCE_NULL ( col, ... ) | * FORCE_QUOTE ( col, ... ) | * — TO-only; we accept for COPY FROM as a no-op ENCODING 'X' — accepted, ignored (UTF-8 internal) FREEZE [ BOOL ] — accepted, ignored DEFAULT 'X' — defaults-marker (PG 16+) OIDS [ BOOL ] — legacy, removed in PG 12; rejected

Output shape (for COPY FROM STDIN): {:type :copy-from-stdin :ns string ;; lowercase table namespace :table string ;; original-case table name :columns [string ...] ;; lowercase, or nil if no col-list given :options {:format :text|:csv|:binary :delimiter String :null-marker String :quote String :escape String :header :true|:false|:match :force-not-null #{string ...} | :all :force-null #{string ...} | :all :encoding String :freeze? boolean :default-marker String}}

Defaults (filled by parse-copy-from): text: delimiter "\t", null-marker "\N" csv: delimiter ",", null-marker "", quote """, escape = quote, header :false binary: rejected at this layer (returns :feature-not-supported)

Token-driven hand-parser for `COPY ... FROM STDIN` (and `COPY ...
TO STDOUT`, deferred). JSqlParser 5.x doesn't recognise COPY at
all — UnsupportedStatement — so the wire-protocol layer needs
structured access before it can drive the COPY-IN sub-protocol.

Also exposes `row->entity-map`: shared helper used by the COPY-IN
exec handler (server.clj) to turn a vector of decoded
String|::null values into a Datahike entity map keyed by
`:<ns>/<col>` attributes, with per-column string→type coercion
driven by `:db/valueType`.

Mirrors the structure of `datahike.pg.sql.database`: tokenise,
parse the prefix (table + optional column list + FROM/TO target),
then parse the option list in either:

  - **Modern paren form** — `WITH (key [=] value [, ...])`
  - **Legacy keyword form** — `[WITH] kw1 kw2 ...` (e.g.
    `WITH BINARY`, `WITH CSV HEADER`, `DELIMITER '|' NULL '\N' CSV`)
    still in the wild from old pg_dump output.

PG syntax (from `../postgres/doc/src/sgml/ref/copy.sgml`):

  COPY [schema.]table [ ( col [, ...] ) ]
      FROM { 'file' | PROGRAM 'cmd' | STDIN }
      [ [ WITH ] ( option [, ...] ) ]

Options accepted:

  FORMAT          'text' | 'csv' | 'binary'
  DELIMITER       'X'                — single byte
  NULL            'X'                — null marker
  HEADER          BOOL | MATCH       — 1st row treatment
  QUOTE           'X'                — CSV quote char
  ESCAPE          'X'                — CSV escape char (defaults to QUOTE)
  FORCE_NOT_NULL  ( col, ... ) | *
  FORCE_NULL      ( col, ... ) | *
  FORCE_QUOTE     ( col, ... ) | *   — TO-only; we accept for COPY FROM as a no-op
  ENCODING        'X'                — accepted, ignored (UTF-8 internal)
  FREEZE          [ BOOL ]           — accepted, ignored
  DEFAULT         'X'                — defaults-marker (PG 16+)
  OIDS            [ BOOL ]           — legacy, removed in PG 12; rejected

Output shape (for COPY FROM STDIN):
  {:type :copy-from-stdin
   :ns string                ;; lowercase table namespace
   :table string             ;; original-case table name
   :columns [string ...]     ;; lowercase, or nil if no col-list given
   :options {:format :text|:csv|:binary
             :delimiter String
             :null-marker String
             :quote String
             :escape String
             :header :true|:false|:match
             :force-not-null #{string ...} | :all
             :force-null #{string ...} | :all
             :encoding String
             :freeze? boolean
             :default-marker String}}

Defaults (filled by `parse-copy-from`):
  text:  delimiter "\t", null-marker "\N"
  csv:   delimiter ",",  null-marker "",
         quote "\"", escape = quote, header :false
  binary: rejected at this layer (returns :feature-not-supported)
raw docstring

datahike.pg.sql.copy.csv-format

PostgreSQL COPY-IN CSV-format decoder. Quote-aware state machine matching PG's CopyReadAttributesCSV semantics from ../postgres/src/backend/commands/copyfromparse.c:1827.

CSV is a different beast from text format:

  • Backslash is a literal char (no escape sequences).
  • End-of-data marker \. is not recognised inside CSV streams (uses CopyDone / EOF instead).
  • NULL detection only fires on unquoted fields whose raw text matches the null marker. "" is empty-string, never null (unless FORCE_NULL is set for that column).
  • Embedded delimiters / line terminators are allowed inside quoted fields.
  • Embedded quote chars in quoted fields are escaped by either:
    • doubling them (default: ESCAPE = QUOTE = ")
    • prefixing with the configured escape char

The state machine per row:

  • Start in NOT_QUOTED. Walk bytes: delimiter → end of field line terminator → end of row quote char → enter QUOTED, set saw_quote=true else → append to field

  • In QUOTED, walk bytes: escape char (peek next): if next is escape or quote → consume, append literal else fall through (treat as literal) quote char → exit QUOTED (back to NOT_QUOTED) else → append to field

  • At end of row, for each field: if !saw_quote AND raw == null_marker → field is ::null else → field is the de-escaped string

  • FORCE_NOT_NULL columns: skip the null check (always treated as non-null even if raw matches null_marker).

  • FORCE_NULL columns: NULL check applies even if quoted (so a quoted "" matching null_marker becomes null instead of empty string).

Streaming API mirrors text-format:

(def d (make-decoder opts)) [d' rows eod?] = (decode-step d chunk) [rows eod?] = (decode-finalize d')

opts keys: :delimiter, :null-marker, :quote, :escape, :header (:true|:false|:match), :force-not-null, :force-null, :columns (used when HEADER MATCH is on).

Output is a vector of vectors-of-(String|::null).

PostgreSQL COPY-IN CSV-format decoder. Quote-aware state machine
matching PG's `CopyReadAttributesCSV` semantics from
`../postgres/src/backend/commands/copyfromparse.c:1827`.

CSV is a *different beast* from text format:

  - Backslash is a literal char (no escape sequences).
  - End-of-data marker `\.` is **not** recognised inside CSV
    streams (uses CopyDone / EOF instead).
  - NULL detection only fires on **unquoted** fields whose raw
    text matches the null marker. `""` is empty-string, never
    null (unless FORCE_NULL is set for that column).
  - Embedded delimiters / line terminators are allowed inside
    quoted fields.
  - Embedded quote chars in quoted fields are escaped by either:
      - doubling them (default: ESCAPE = QUOTE = `"`)
      - prefixing with the configured escape char

The state machine per row:

  - Start in NOT_QUOTED. Walk bytes:
      delimiter → end of field
      line terminator → end of row
      quote char → enter QUOTED, set saw_quote=true
      else → append to field
  - In QUOTED, walk bytes:
      escape char (peek next):
        if next is escape or quote → consume, append literal
        else fall through (treat as literal)
      quote char → exit QUOTED (back to NOT_QUOTED)
      else → append to field

  - At end of row, for each field:
      if !saw_quote AND raw == null_marker → field is ::null
      else → field is the de-escaped string

  - FORCE_NOT_NULL columns: skip the null check (always treated
    as non-null even if raw matches null_marker).
  - FORCE_NULL columns: NULL check applies even if quoted (so a
    quoted `""` matching null_marker becomes null instead of
    empty string).

Streaming API mirrors `text-format`:

  (def d (make-decoder opts))
  [d' rows eod?] = (decode-step d chunk)
  [rows eod?]   = (decode-finalize d')

`opts` keys: `:delimiter`, `:null-marker`, `:quote`, `:escape`,
`:header` (`:true|:false|:match`), `:force-not-null`, `:force-null`,
`:columns` (used when HEADER MATCH is on).

Output is a vector of vectors-of-(String|`::null`).
raw docstring

datahike.pg.sql.copy.text-format

PostgreSQL COPY-IN text-format decoder. Pure data transformation: bytes / string chunks in, vectors of fields out. Field values are either Strings or the sentinel ::null (used in place of nil so downstream tx-data builders can distinguish a missing column from an explicit NULL).

Spec source: ../postgres/doc/src/sgml/ref/copy.sgml and ../postgres/src/backend/commands/copyfromparse.c.

Salient mechanics:

  • Lines are delimited by \n, \r, or \r\n. Once the first terminator is observed, the EOL type is locked for the rest of the stream — mid-stream switching is an error per PG.
  • End-of-data marker: a line containing exactly \.. Bytes after the EOD line are silently discarded.
  • NULL marker: configurable (default \N). Checked against the raw (pre-de-escape) field text before backslash processing.
  • Field separator: configurable single ASCII byte (default \t). Must be backslash-escaped to appear as a data byte.
  • Backslash escapes (recognised after delimiter/EOD parsing): \b \f \n \r \t \v for control chars; \NNN octal (1-3 digits); \xNN hex (1-2 digits); \<any-other-char> passes the char through literally.

Streaming API:

(def d (make-decoder opts)) (let [[d' rows eod?] (decode-step d chunk)] ;; called per CopyData ...) (let [[rows eod?] (decode-finalize d')] ;; called on CopyDone ...)

opts keys: :delimiter (1-char String, required), :null-marker (String, required). Output: rows are vectors of (String | ::null).

PostgreSQL COPY-IN text-format decoder. Pure data transformation:
bytes / string chunks in, vectors of fields out. Field values are
either Strings or the sentinel `::null` (used in place of `nil` so
downstream tx-data builders can distinguish a missing column from
an explicit NULL).

Spec source: `../postgres/doc/src/sgml/ref/copy.sgml` and
`../postgres/src/backend/commands/copyfromparse.c`.

Salient mechanics:

  - **Lines** are delimited by `\n`, `\r`, or `\r\n`. Once the
    first terminator is observed, the EOL type is locked for the
    rest of the stream — mid-stream switching is an error per PG.
  - **End-of-data** marker: a line containing exactly `\.`.
    Bytes after the EOD line are silently discarded.
  - **NULL marker**: configurable (default `\N`). Checked against
    the *raw* (pre-de-escape) field text before backslash
    processing.
  - **Field separator**: configurable single ASCII byte (default
    `\t`). Must be backslash-escaped to appear as a data byte.
  - **Backslash escapes** (recognised after delimiter/EOD parsing):
    `\b \f \n \r \t \v` for control chars; `\NNN` octal
    (1-3 digits); `\xNN` hex (1-2 digits); `\<any-other-char>`
    passes the char through literally.

Streaming API:

  (def d (make-decoder opts))
  (let [[d' rows eod?] (decode-step d chunk)]    ;; called per CopyData
    ...)
  (let [[rows eod?] (decode-finalize d')]        ;; called on CopyDone
    ...)

`opts` keys: `:delimiter` (1-char String, required),
             `:null-marker` (String, required).
Output: rows are vectors of (String | `::null`).
raw docstring

datahike.pg.sql.ctx

Translation context for the SQL → Datalog translator.

The ctx is an immutable map of per-translation atoms that every translate-* fn threads through. Atoms capture the side-effecting state (fresh-var counter, collected where clauses, entity-var bindings, prepared-statement placeholders, …) while the outer map stays shareable with nested sub-translations.

Fn inventory:

  • resolve-column / resolve-inherited-attr — map a JSqlParser Column reference to a Datahike attribute keyword, following the table-alias map and PostgreSQL INHERITS semantics.
  • make-ctx — build a fresh context.
  • fresh-var!, entity-var!, add-clause!, col-var! — primitives the translators call to allocate logic variables, assign entity bindings, append where clauses, and lazily produce get-else-backed column bindings.
  • materialize-arg! — bind a composite expression to a fresh var.
  • null-guard-clauses / make-columns-optional! — SQL 3-valued-logic helpers.
  • collect-vars — recursively gather ?v-style symbols out of a translated form.

Fns originally marked ^:private in sql.clj are promoted to public here so the extracted translate-* namespaces can reach them without re-exporting through the top-level sql ns.

Translation context for the SQL → Datalog translator.

The `ctx` is an immutable map of per-translation atoms that every
`translate-*` fn threads through. Atoms capture the side-effecting
state (fresh-var counter, collected where clauses, entity-var
bindings, prepared-statement placeholders, …) while the outer map
stays shareable with nested sub-translations.

Fn inventory:

- `resolve-column` / `resolve-inherited-attr` — map a JSqlParser
  `Column` reference to a Datahike attribute keyword, following the
  table-alias map and PostgreSQL INHERITS semantics.
- `make-ctx` — build a fresh context.
- `fresh-var!`, `entity-var!`, `add-clause!`, `col-var!` —
  primitives the translators call to allocate logic variables,
  assign entity bindings, append where clauses, and lazily
  produce `get-else`-backed column bindings.
- `materialize-arg!` — bind a composite expression to a fresh var.
- `null-guard-clauses` / `make-columns-optional!` — SQL
  3-valued-logic helpers.
- `collect-vars` — recursively gather `?v`-style symbols out of
  a translated form.

Fns originally marked `^:private` in sql.clj are promoted to
public here so the extracted translate-* namespaces can reach
them without re-exporting through the top-level sql ns.
raw docstring

datahike.pg.sql.database

CREATE DATABASE / DROP DATABASE token-driven parser plus the db-from-template helper used to wire a server-level config template to a per-database config.

JSqlParser 5.x has no AST class for either statement (returns UnsupportedStatement), so we hand-parse before JSqlParser sees the SQL. The grammar accepted is:

CREATE DATABASE name [WITH] [(] [k [=] v [,]]* [)] DROP DATABASE [IF EXISTS] name [WITH (FORCE)]

name is either a bare identifier or a quoted identifier ("..."). Option values are 'string', "ident", number, true|false, or a bare identifier.

Datahike-aware option keys are translated into datahike config:

BACKEND -> [:store :backend] keyword STORE_ID -> [:store :id] string PATH -> [:store :path] string (file backend) HOST / PORT / USER / -> [:store :*] string|long PASSWORD / DBNAME (pg backend) SCHEMA_FLEXIBILITY -> :schema-flexibility keyword KEEP_HISTORY -> :keep-history? boolean INDEX -> :index :datahike.index/<value>

PostgreSQL-only option keys (OWNER, TEMPLATE, ENCODING, LC_COLLATE, LC_CTYPE, LOCALE, LOCALE_PROVIDER, ICU_LOCALE, ICU_RULES, COLLATION_VERSION, TABLESPACE, ALLOW_CONNECTIONS, CONNECTION_LIMIT, IS_TEMPLATE, OID, STRATEGY, REFRESH_COLLATION_VERSION) are silently accepted with a NOTICE so pg_dump output round-trips.

Unknown option names raise :syntax-error.

CREATE DATABASE / DROP DATABASE token-driven parser plus the
`db-from-template` helper used to wire a server-level config
template to a per-database config.

JSqlParser 5.x has no AST class for either statement (returns
`UnsupportedStatement`), so we hand-parse before JSqlParser sees
the SQL. The grammar accepted is:

  CREATE DATABASE name [WITH] [(] [k [=] v [,]]* [)]
  DROP DATABASE [IF EXISTS] name [WITH (FORCE)]

`name` is either a bare identifier or a quoted identifier
("..."). Option values are 'string', "ident", number,
true|false, or a bare identifier.

Datahike-aware option keys are translated into datahike config:

  BACKEND               -> [:store :backend] keyword
  STORE_ID              -> [:store :id]      string
  PATH                  -> [:store :path]    string (file backend)
  HOST / PORT / USER /  -> [:store :*]       string|long
  PASSWORD / DBNAME       (pg backend)
  SCHEMA_FLEXIBILITY    -> :schema-flexibility keyword
  KEEP_HISTORY          -> :keep-history?    boolean
  INDEX                 -> :index            :datahike.index/<value>

PostgreSQL-only option keys (OWNER, TEMPLATE, ENCODING, LC_COLLATE,
LC_CTYPE, LOCALE, LOCALE_PROVIDER, ICU_LOCALE, ICU_RULES,
COLLATION_VERSION, TABLESPACE, ALLOW_CONNECTIONS, CONNECTION_LIMIT,
IS_TEMPLATE, OID, STRATEGY, REFRESH_COLLATION_VERSION) are silently
accepted with a NOTICE so pg_dump output round-trips.

Unknown option names raise `:syntax-error`.
raw docstring

datahike.pg.sql.ddl

DDL translation: CREATE TABLE, CREATE SEQUENCE, constraint extraction.

Produces Datahike tx-data that installs schema attributes (:db/ident, :db/valueType, :db/cardinality, :db/unique, :pg/type, and the pgwire-specific :pg/* constraint metadata).

The predicate helpers (identity-column?, column-is-primary-key?, column-is-unique?, column-is-not-null?, specs-contain-seq?, column-check-expr-text, column-default-spec) walk JSqlParser's ColumnDefinition.getColumnSpecs lists, which are raw strings — so each predicate is a case-insensitive token matcher.

Entry points:

  • translate-create-table — main; returns {:type :ddl :tx-data [...]}
  • translate-create-sequence — CREATE SEQUENCE name [...]

Helpers:

  • extract-ddl-constraints — CHECK / NOT NULL / UNIQUE / FK clauses
  • extract-inherits — PostgreSQL INHERITS
DDL translation: CREATE TABLE, CREATE SEQUENCE, constraint extraction.

Produces Datahike tx-data that installs schema attributes (`:db/ident`,
`:db/valueType`, `:db/cardinality`, `:db/unique`, `:pg/type`, and the
pgwire-specific `:pg/*` constraint metadata).

The predicate helpers (`identity-column?`, `column-is-primary-key?`,
`column-is-unique?`, `column-is-not-null?`, `specs-contain-seq?`,
`column-check-expr-text`, `column-default-spec`) walk JSqlParser's
`ColumnDefinition.getColumnSpecs` lists, which are raw strings — so
each predicate is a case-insensitive token matcher.

Entry points:
  - translate-create-table    — main; returns {:type :ddl :tx-data [...]}
  - translate-create-sequence — CREATE SEQUENCE name [...]

Helpers:
  - extract-ddl-constraints — CHECK / NOT NULL / UNIQUE / FK clauses
  - extract-inherits        — PostgreSQL INHERITS
raw docstring

datahike.pg.sql.expr

Expression + predicate translation.

Converts JSqlParser expression nodes (arithmetic, comparisons, CASE, CAST, function calls, IN / BETWEEN / EXISTS subqueries, JSON operators, …) into Datalog find-element expressions and where-clause predicates.

Two families of entry points:

  • Expression side (translate-expr): evaluates to a value or binds a var. Produces ?v symbols via materialization when the result feeds into a SELECT projection, ORDER BY key, aggregate argument, or outer SQL function call.

  • Predicate side (translate-predicate): produces a vec of Datalog where-clauses that filter rows. Recursively handles AND / OR / NOT, comparison ops with three-valued-logic null guards, IN-list / IN-subquery, BETWEEN, IS NULL, EXISTS and NOT EXISTS (including the correlated-subquery path with entity-var unification + not-join fallback), LIKE / ILIKE, JSON operators (@>, <@, ?, ?|, ?&), and regex match.

translate-expr and translate-predicate are mutually recursive through translate-case-expr (WHEN predicates) and translate-comparison (operand expressions). Everything lives in one namespace so the recursion is a regular in-ns declare and doesn't cross any ns boundary.

Back-edge to sql.clj: the EXISTS / IN-subquery branches need to recursively re-enter the parser for sub-SQL. Rather than import datahike.pg.sql (which would create a load cycle), we receive parse-sql through the ctx's :parse-sql slot — populated by datahike.pg.sql/parse-sql at top-level ctx construction and inherited by sub-ctxs (for CTE / correlated subquery / derived tables) when they copy from their outer ctx.

Every translate-* fn takes a ctx first-argument (built by datahike.pg.sql.ctx/make-ctx) and returns either a value, a fresh var, or a vector of where-clauses (for the predicate side). Side effects flow through the ctx's atoms (:where-clauses, :in-args, :param-placeholders, :entity-vars, :col->var).

Expression + predicate translation.

Converts JSqlParser expression nodes (arithmetic, comparisons, CASE,
CAST, function calls, IN / BETWEEN / EXISTS subqueries, JSON
operators, …) into Datalog find-element expressions and
where-clause predicates.

Two families of entry points:

- **Expression side** (`translate-expr`): evaluates to a value or
  binds a var. Produces `?v` symbols via materialization when the
  result feeds into a SELECT projection, ORDER BY key, aggregate
  argument, or outer SQL function call.

- **Predicate side** (`translate-predicate`): produces a vec of
  Datalog where-clauses that filter rows. Recursively handles
  AND / OR / NOT, comparison ops with three-valued-logic null
  guards, IN-list / IN-subquery, BETWEEN, IS NULL, EXISTS and
  NOT EXISTS (including the correlated-subquery path with
  entity-var unification + not-join fallback), LIKE / ILIKE,
  JSON operators (@>, <@, ?, ?|, ?&), and regex match.

`translate-expr` and `translate-predicate` are mutually recursive
through `translate-case-expr` (WHEN predicates) and
`translate-comparison` (operand expressions). Everything lives in
one namespace so the recursion is a regular in-ns `declare` and
doesn't cross any ns boundary.

Back-edge to sql.clj: the EXISTS / IN-subquery branches need to
recursively re-enter the parser for sub-SQL. Rather than import
`datahike.pg.sql` (which would create a load cycle), we receive
`parse-sql` through the ctx's `:parse-sql` slot — populated by
`datahike.pg.sql/parse-sql` at top-level ctx construction and
inherited by sub-ctxs (for CTE / correlated subquery / derived
tables) when they copy from their outer ctx.

Every translate-* fn takes a ctx first-argument (built by
`datahike.pg.sql.ctx/make-ctx`) and returns either a value, a
fresh var, or a vector of where-clauses (for the predicate side).
Side effects flow through the ctx's atoms (`:where-clauses`,
`:in-args`, `:param-placeholders`, `:entity-vars`, `:col->var`).
raw docstring

datahike.pg.sql.fns

SQL → Clojure function wrappers.

Every named function the translator can emit at query time lives here, behind a stable (fully-qualified) symbol that Datahike's resolve-fn looks up at execute time:

  • Aggregate wrappers (filter-sum / filter-avg / …) — variants of the core Datahike aggregates that skip the :__null__ sentinel and treat nil as SQL NULL. Needed because core's sum/avg throw on keyword arithmetic and min/max mis-order keyword vs number.

  • null-safe — wraps a scalar fn so either-operand-is-NULL returns NULL. Used to adapt Clojure string/math fns (upper, length, arithmetic) to SQL 3-valued logic.

  • SQL string fns (sql-lpad, sql-position, …) — thin wrappers with standard SQL semantics (1-based positions, padding behavior).

  • PG-catalog-function stubs (pg-table-is-visible, pg-format-type, pg-get-expr) — minimum viable to make JDBC drivers and ORMs that probe the catalog not trip over missing functions.

  • Lookup tables:

    • sql-aggregate->datalog — SQL agg name → fully-qualified symbol. Emitted symbol resolves via datahike.pg.sql/filter-* which re-exports from this ns.
    • sql-fn->clj-fn — SQL fn name → actual IFn value. Used by translate-function-call at emit time to wrap in null-safe.

Runtime symbols re-exported from datahike.pg.sql so the translator's emitted forms keep the old qualified path ('datahike.pg.sql/filter-count, 'datahike.pg.sql/sql-+, …) for backward compat with already-prepared statements cached on clients.

SQL → Clojure function wrappers.

Every named function the translator can emit at query time lives
here, behind a stable (fully-qualified) symbol that Datahike's
`resolve-fn` looks up at execute time:

- Aggregate wrappers (`filter-sum` / `filter-avg` / …) — variants of
  the core Datahike aggregates that skip the `:__null__` sentinel and
  treat nil as SQL NULL. Needed because core's `sum`/`avg` throw on
  keyword arithmetic and `min`/`max` mis-order keyword vs number.

- `null-safe` — wraps a scalar fn so either-operand-is-NULL returns
  NULL. Used to adapt Clojure string/math fns (upper, length,
  arithmetic) to SQL 3-valued logic.

- SQL string fns (`sql-lpad`, `sql-position`, …) — thin wrappers
  with standard SQL semantics (1-based positions, padding behavior).

- PG-catalog-function stubs (`pg-table-is-visible`, `pg-format-type`,
  `pg-get-expr`) — minimum viable to make JDBC drivers and ORMs
  that probe the catalog not trip over missing functions.

- Lookup tables:
    * `sql-aggregate->datalog` — SQL agg name → fully-qualified
      symbol. Emitted symbol resolves via
      `datahike.pg.sql/filter-*` which re-exports from this ns.
    * `sql-fn->clj-fn` — SQL fn name → actual IFn value. Used by
      `translate-function-call` at emit time to wrap in `null-safe`.

Runtime symbols re-exported from `datahike.pg.sql` so the
translator's emitted forms keep the old qualified path
(`'datahike.pg.sql/filter-count`, `'datahike.pg.sql/sql-+`, …)
for backward compat with already-prepared statements cached on
clients.
raw docstring

datahike.pg.sql.oid-infer

Parse-time type inference for SELECT-item expressions.

Mirrors PostgreSQL's exprType (src/backend/nodes/nodeFuncs.c): each node in the expression tree has a declared result OID, computed by walking the AST during parse/analyze. PG looks up operator and function signatures in pg_operator / pg_proc; we hardcode the equivalent rules for the set of SQL we handle.

Consumed by translate-select to populate :select-item-oids on the parsed map, which describeResult reads to emit correct RowDescription OIDs via the Extended Query protocol — before Execute runs, so value-based inference isn't available.

Return values:

  • A positive integer OID when the type is determined.
  • nil when we can't determine it (unknown function, unresolvable column, subquery, etc.) — callers fall back to TEXT (OID 25), matching the pre-existing behavior.
Parse-time type inference for SELECT-item expressions.

Mirrors PostgreSQL's `exprType` (src/backend/nodes/nodeFuncs.c): each
node in the expression tree has a declared result OID, computed by
walking the AST during parse/analyze. PG looks up operator and
function signatures in `pg_operator` / `pg_proc`; we hardcode the
equivalent rules for the set of SQL we handle.

Consumed by `translate-select` to populate `:select-item-oids` on the
parsed map, which `describeResult` reads to emit correct
RowDescription OIDs via the Extended Query protocol — before Execute
runs, so value-based inference isn't available.

Return values:
  - A positive integer OID when the type is determined.
  - `nil` when we can't determine it (unknown function, unresolvable
    column, subquery, etc.) — callers fall back to TEXT (OID 25),
    matching the pre-existing behavior.
raw docstring

datahike.pg.sql.params

Prepared-statement parameter substitution and PG OID inference.

Three concerns live together here because they all deal with the ? / $N placeholder lifecycle:

  1. ParamRef record + param-ref?. Emitted by translators at Parse time inside tx-data / query structures, replaced by real values at Execute time via substitute-params.

  2. *bound-params* dynamic var: when bound to a 1-indexed vector of resolved values, translator branches (e.g. the JdbcParameter expression) resolve placeholders in-line instead of emitting ParamRef. This lets the same translator body serve both prepared-Parse (returns structured AST with ParamRefs) and re-translation-during-Execute (returns fully-bound form).

  3. OID inference (infer-param-oid-for-column, insert-param-oids, update-param-oids, where-param-oids) — walks a JSqlParser AST, maps each placeholder index to the PG type OID of the column it's bound against. Consumed by describeParams to populate the ParameterDescription message so pgjdbc / psycopg2 / etc. size their binary binds correctly.

*parse-db* is bound by parse-sql (higher-level) to the current db snapshot so OID inference can consult :pg/type metadata that (:schema db) doesn't surface.

Prepared-statement parameter substitution and PG OID inference.

Three concerns live together here because they all deal with the
`?` / `$N` placeholder lifecycle:

1. ParamRef record + `param-ref?`. Emitted by translators at Parse
   time inside tx-data / query structures, replaced by real values
   at Execute time via `substitute-params`.

2. `*bound-params*` dynamic var: when bound to a 1-indexed vector
   of resolved values, translator branches (e.g. the JdbcParameter
   expression) resolve placeholders in-line instead of emitting
   ParamRef. This lets the same translator body serve both
   prepared-Parse (returns structured AST with ParamRefs) and
   re-translation-during-Execute (returns fully-bound form).

3. OID inference (`infer-param-oid-for-column`,
   `insert-param-oids`, `update-param-oids`, `where-param-oids`) —
   walks a JSqlParser AST, maps each placeholder index to the PG
   type OID of the column it's bound against. Consumed by
   describeParams to populate the ParameterDescription message so
   pgjdbc / psycopg2 / etc. size their binary binds correctly.

`*parse-db*` is bound by parse-sql (higher-level) to the current
db snapshot so OID inference can consult :pg/type metadata that
(:schema db) doesn't surface.
raw docstring

datahike.pg.sql.rewrite

Token-driven SQL source rewrites. Normalize SQL before JSqlParser sees it by excising or injecting source-level spans — all based on positions captured by the datahike.pg.sql.classify tokenizer.

Each rule is a pure function (tokens) -> seq of spans, where a span is [start end replacement]. The rewriter applies all non- overlapping spans right-to-left (so earlier offsets stay stable) and returns the new SQL string.

Why this exists: the previous preprocess-sql was a pile of regex str/replace calls that could false-positive on keywords inside string literals, dollar-quotes, or comments (SELECT 'REFERENCES' was vulnerable to the inline-REFERENCES stripper). Token-based rules only match tokens of the right kind, so a keyword inside a :string or :comment is invisible to them.

Callers: sql/preprocess-sql.

Token-driven SQL source rewrites. Normalize SQL before JSqlParser
sees it by excising or injecting source-level spans — all based on
positions captured by the datahike.pg.sql.classify tokenizer.

Each rule is a pure function `(tokens) -> seq of spans`, where a
span is `[start end replacement]`. The rewriter applies all non-
overlapping spans right-to-left (so earlier offsets stay stable)
and returns the new SQL string.

Why this exists: the previous preprocess-sql was a pile of regex
`str/replace` calls that could false-positive on keywords inside
string literals, dollar-quotes, or comments (`SELECT 'REFERENCES'`
was vulnerable to the inline-REFERENCES stripper). Token-based
rules only match tokens of the right kind, so a keyword inside a
:string or :comment is invisible to them.

Callers: sql/preprocess-sql.
raw docstring

datahike.pg.sql.shape

Structural shape matching for long-form SELECT statements.

A companion to datahike.pg.sql.classify: classify routes by the first few tokens; shape answers 'what does the rest of this SELECT look like?'. Used by sql/system-query?* to identify pgjdbc + Odoo catalog probes that can't be recognized by a leading keyword — their identifying signal lives deep in the projection list, the FROM clause, or buried qualified identifiers.

This replaces the substring-match branches the classifier previously fell back to (str/includes? on lowercased SQL). Substring matching is safe when the needle is a distinctive SQL-only string like 'fk.conname as name', but it's a different model than the rest of the pipeline: a string literal containing that phrase would false-match. Tokenizing and matching on structural features (qualified references, AS aliases, function names) is immune to keyword-inside-string / keyword-inside- comment hostile inputs.

API: (catalog-probe sql) → :get-fk-conname | :get-primary-keys | :get-field-metadata | :empty-catalog | nil

Structural shape matching for long-form SELECT statements.

A companion to datahike.pg.sql.classify: classify routes by the first
few tokens; shape answers 'what does the rest of this SELECT look
like?'. Used by sql/system-query?* to identify pgjdbc + Odoo
catalog probes that can't be recognized by a leading keyword —
their identifying signal lives deep in the projection list, the
FROM clause, or buried qualified identifiers.

This replaces the substring-match branches the classifier
previously fell back to (str/includes? on lowercased SQL).
Substring matching is safe when the needle is a distinctive
SQL-only string like 'fk.conname as name', but it's a different
model than the rest of the pipeline: a string literal containing
that phrase would false-match. Tokenizing and matching on
structural features (qualified references, AS aliases, function
names) is immune to keyword-inside-string / keyword-inside-
comment hostile inputs.

API:
  (catalog-probe sql) → :get-fk-conname | :get-primary-keys
                      | :get-field-metadata | :empty-catalog | nil
raw docstring

datahike.pg.sql.stmt

Statement-level translation: SELECT / INSERT / UPDATE / DELETE / CTE.

The top half of this namespace (translate-select + its join / HAVING / materialization machinery) maps a PlainSelect AST to a Datalog query-map plus row-formatting metadata the handler uses at execute time.

The middle half handles INSERT / UPDATE / DELETE: extract-value lifts JSqlParser literal/expression nodes into Clojure values, coerce-insert-value adapts them to the target column's :db/valueType, and translate-insert / translate-update / translate-delete produce tx-data + (for UPDATE/DELETE) an eids-walk query. INSERT RETURNING and UPDATE RETURNING land in extract-returning; CHECK / UPDATE expressions evaluated per-row at handler time go through eval-check-predicate / eval-update-expr.

The bottom half implements CTEs: translate-cte-branch materializes one WITH-clause body against an enriched db so the outer SELECT / DML can reference it as a virtual table; translate-recursive-cte handles the WITH RECURSIVE form by iterating until the CTE converges.

The three blocks live in one namespace because they are mutually recursive:

translate-select → translate-recursive-cte (WITH in SELECT) translate-recursive-cte → translate-select (CTE body) translate-insert → translate-select (INSERT ... SELECT) translate-insert → translate-recursive-cte (INSERT ... WITH) translate-select → extract-value (scalar subqueries)

Dependencies on already-extracted namespaces are all one-way:

stmt → expr (translate-expr, translate-predicate, …) stmt → ctx (make-ctx, col-var!, resolve-column, …) stmt → fns (aggregate lookups) stmt → params (ParamRef, from-bindings, parse-db) stmt → jsonb, schema, types (type coercion + jsonb ops)

Statement-level translation: SELECT / INSERT / UPDATE / DELETE / CTE.

The top half of this namespace (translate-select + its join /
HAVING / materialization machinery) maps a PlainSelect AST to a
Datalog query-map plus row-formatting metadata the handler uses
at execute time.

The middle half handles INSERT / UPDATE / DELETE: `extract-value`
lifts JSqlParser literal/expression nodes into Clojure values,
`coerce-insert-value` adapts them to the target column's
:db/valueType, and `translate-insert` / `translate-update` /
`translate-delete` produce tx-data + (for UPDATE/DELETE) an
eids-walk query. INSERT RETURNING and UPDATE RETURNING land in
`extract-returning`; CHECK / UPDATE expressions evaluated
per-row at handler time go through `eval-check-predicate` /
`eval-update-expr`.

The bottom half implements CTEs: `translate-cte-branch`
materializes one WITH-clause body against an enriched db so the
outer SELECT / DML can reference it as a virtual table;
`translate-recursive-cte` handles the WITH RECURSIVE form by
iterating until the CTE converges.

The three blocks live in one namespace because they are mutually
recursive:

  translate-select → translate-recursive-cte        (WITH in SELECT)
  translate-recursive-cte → translate-select         (CTE body)
  translate-insert → translate-select                (INSERT ... SELECT)
  translate-insert → translate-recursive-cte         (INSERT ... WITH)
  translate-select → extract-value                   (scalar subqueries)

Dependencies on already-extracted namespaces are all one-way:

  stmt → expr (translate-expr, translate-predicate, …)
  stmt → ctx  (make-ctx, col-var!, resolve-column, …)
  stmt → fns  (aggregate lookups)
  stmt → params (ParamRef, *from-bindings*, *parse-db*)
  stmt → jsonb, schema, types   (type coercion + jsonb ops)
raw docstring

datahike.pg.sql.template

Lexical INSERT-VALUES templater for the parse-sql fast path.

Bulk-load workloads (pg_dump replay, ETL pipelines, log ingestion) repeatedly issue INSERT statements with the same structural shape but varying literal values. The full parse-sql pipeline — JSqlParser AST + pg-datahike SQL→Datalog translation — costs ~1 ms/call dominated by JSqlParser, so a 46k-row Pagila replay paid ~46 s on parsing alone.

The trick: every Pagila-style row of one table has the same shape, just different values. If we replace the literal values with ? placeholders, the templated SQL string is shared across all rows of that table — and our parse-sql LRU cache turns every-row-but-the-first into a sub-µs cache hit. We only need a fast lexical scan to extract literals; we don't need a real parser.

This namespace exports:

  • template-insert-sql — pure string transform; returns {:templated <sql> :literals [<tok>...]} or nil if the SQL doesn't match the simple INSERT VALUES shape we know how to template.
  • parse-literal-token — best-guess Java type for one captured literal (Long, Double, String, nil, Boolean, …).
  • typed-substitute — replace ParamRefs in a parsed result with values coerced through coerce-insert-value so types match each column's :db/valueType.

Anything outside the simple INSERT VALUES shape returns nil from template-insert-sql so the caller falls through to full parse-sql. Conservative: false negatives (missed templating) degrade gracefully to the existing path; we never produce wrong tx-data.

Lexical INSERT-VALUES templater for the parse-sql fast path.

Bulk-load workloads (`pg_dump` replay, ETL pipelines, log
ingestion) repeatedly issue INSERT statements with the same
structural shape but varying literal values. The full parse-sql
pipeline — JSqlParser AST + pg-datahike SQL→Datalog translation
— costs ~1 ms/call dominated by JSqlParser, so a 46k-row Pagila
replay paid ~46 s on parsing alone.

The trick: every Pagila-style row of one table has the *same*
shape, just different values. If we replace the literal values
with `?` placeholders, the templated SQL string is shared across
all rows of that table — and our parse-sql LRU cache turns
every-row-but-the-first into a sub-µs cache hit. We only need a
fast lexical scan to extract literals; we don't need a real
parser.

This namespace exports:

  - `template-insert-sql`  — pure string transform; returns
                             `{:templated <sql> :literals [<tok>...]}`
                             or nil if the SQL doesn't match the
                             simple INSERT VALUES shape we know
                             how to template.
  - `parse-literal-token`  — best-guess Java type for one captured
                             literal (Long, Double, String, nil,
                             Boolean, …).
  - `typed-substitute`     — replace ParamRefs in a parsed result
                             with values coerced through
                             `coerce-insert-value` so types match
                             each column's `:db/valueType`.

Anything outside the simple INSERT VALUES shape returns nil from
`template-insert-sql` so the caller falls through to full
parse-sql. Conservative: false negatives (missed templating)
degrade gracefully to the existing path; we never produce wrong
tx-data.
raw docstring

datahike.pg.sql.types

Parsers for CREATE TYPE … AS ENUM (…) and CREATE DOMAIN. Both bypass JSqlParser (which can't / doesn't parse them) and produce parsed maps the server's exec-* dispatch picks up.

Mirrors the shape of datahike.pg.sql.database (CREATE DATABASE): classify routes the statement to a system-type bypass, this ns tokenises and extracts the structured form, the server transacts it as a registry entity (:datahike.pg.enum/* or :datahike.pg.domain/*).

Why registry entities and not native Datahike types: enums are semantically string + check-in <values>; domains are <base-type> + optional CHECK. Both can be lowered transparently when a column references them in CREATE TABLE — column-type resolution looks the name up in the registry and expands. The dump tool reads the registry back and emits CREATE TYPE / CREATE DOMAIN headers before tables, so the round trip preserves the full schema.

Parsers for `CREATE TYPE … AS ENUM (…)` and `CREATE DOMAIN`. Both
bypass JSqlParser (which can't / doesn't parse them) and produce
parsed maps the server's exec-* dispatch picks up.

Mirrors the shape of `datahike.pg.sql.database` (CREATE DATABASE):
classify routes the statement to a system-type bypass, this ns
tokenises and extracts the structured form, the server transacts
it as a registry entity (`:datahike.pg.enum/*` or
`:datahike.pg.domain/*`).

Why registry entities and not native Datahike types: enums are
semantically `string + check-in <values>`; domains are
`<base-type> + optional CHECK`. Both can be lowered transparently
when a column references them in CREATE TABLE — column-type
resolution looks the name up in the registry and expands. The
dump tool reads the registry back and emits CREATE TYPE / CREATE
DOMAIN headers before tables, so the round trip preserves the
full schema.
raw docstring

datahike.pg.types

PostgreSQL type system registry for the PgWire compatibility layer.

Centralizes all type mappings between PostgreSQL OIDs, SQL type names,

Centralizes all type mappings between PostgreSQL OIDs, SQL type names, Datahike value types, and wire protocol format codes.

Authoritative source: PostgreSQL 19devel src/include/catalog/pg_type.dat

Three directions of mapping:

  1. SQL name → Datahike type (for CREATE TABLE)
  2. Datahike type → PG OID (for wire protocol RowDescription)
  3. PG OID → SQL name (for format_type() and information_schema)
  4. SQL name → category (for CAST type classification)
PostgreSQL type system registry for the PgWire compatibility layer.

Centralizes all type mappings between PostgreSQL OIDs, SQL type names,

Centralizes all type mappings between PostgreSQL OIDs, SQL type names,
Datahike value types, and wire protocol format codes.

Authoritative source: PostgreSQL 19devel src/include/catalog/pg_type.dat

Three directions of mapping:
1. SQL name → Datahike type (for CREATE TABLE)
2. Datahike type → PG OID (for wire protocol RowDescription)
3. PG OID → SQL name (for format_type() and information_schema)
4. SQL name → category (for CAST type classification)
raw docstring

datahike.pg.window

Window function post-processing engine.

Operates on query result rows (vectors of values) after the main Datalog query has executed. Computes ROW_NUMBER, RANK, DENSE_RANK, NTILE, PERCENT_RANK, CUME_DIST, running SUM/AVG/COUNT/MIN/MAX, and LAG/LEAD.

Architecture: matches Stratum's window.clj interface (same spec maps) but uses row-based processing instead of columnar typed arrays.

Window spec format: {:op :row-number/:rank/:dense-rank/:ntile/:sum/:avg/:count/:min/:max/:lag/:lead :col-idx int — column index for aggregate/offset ops (nil for ranking) :partition-by [idx ...] — column indices for partitioning :order-by [[idx :asc/:desc] ...] — column indices + direction for sorting :frame {:type :rows/:range :start bound :end bound} :offset int — LAG/LEAD offset (default 1) :default val — LAG/LEAD default value :ntile-n int} — NTILE bucket count

Window function post-processing engine.

Operates on query result rows (vectors of values) after the main Datalog
query has executed. Computes ROW_NUMBER, RANK, DENSE_RANK, NTILE,
PERCENT_RANK, CUME_DIST, running SUM/AVG/COUNT/MIN/MAX, and LAG/LEAD.

Architecture: matches Stratum's window.clj interface (same spec maps)
but uses row-based processing instead of columnar typed arrays.

Window spec format:
  {:op :row-number/:rank/:dense-rank/:ntile/:sum/:avg/:count/:min/:max/:lag/:lead
   :col-idx int          — column index for aggregate/offset ops (nil for ranking)
   :partition-by [idx ...] — column indices for partitioning
   :order-by [[idx :asc/:desc] ...] — column indices + direction for sorting
   :frame {:type :rows/:range :start bound :end bound}
   :offset int           — LAG/LEAD offset (default 1)
   :default val          — LAG/LEAD default value
   :ntile-n int}         — NTILE bucket count
raw docstring

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts
Ctrl+kJump to recent docs
Move to previous article
Move to next article
Ctrl+/Jump to the search field
× close