Liking cljdoc? Tell your friends :D

pg-datahike

PostgreSQL access for Datahike. Embeds a PG-compatible adapter — wire protocol, SQL translator, virtual pg_* / information_schema catalogs, constraint enforcement, schema hints — inside a Datahike process. Clients that speak PostgreSQL (pgjdbc, psql, psycopg2, Rails ActiveRecord, Hibernate, Odoo, Flyway, Alembic) talk to Datahike without a PostgreSQL install.

The integration is bidirectional at the datom layer: tables created over SQL are normal Datahike schemas and queryable from Clojure via (d/q …); existing Datahike schemas are visible as SQL tables with no setup. Temporal queries (as-of, history, since) are exposed through SET datahike.as_of.

Status: beta. Ready for the workloads it has been tested against (pgjdbc ResultSetTest 80/80, Odoo 19 module boot + TestORM, Flyway-style migrations). Not a full PG dialect; see Compatibility.

Quickstart

Standalone server (no Clojure setup)

Each pg-datahike release ships a runnable uberjar on GitHub releases — JDK 17+ is the only prerequisite:

java -jar pg-datahike-VERSION-standalone.jar
# pg-datahike VERSION ready on 127.0.0.1:5432
#   backend:  file (~/.local/share/pg-datahike)
#   CREATE DATABASE:  enabled
#   databases: ["datahike"]

psql postgresql://datahike@localhost:5432/datahike

Useful flags (--help for the full list):

java -jar pg-datahike.jar --memory                      # ephemeral
java -jar pg-datahike.jar --port 15432 --data-dir /var/lib/dh
java -jar pg-datahike.jar --db prod --db staging        # pre-create dbs

The dump subcommand exports a database to portable PostgreSQL SQL — replay-ready in either pg-datahike or real PG via psql. See Migration & pg_dump interop:

java -jar pg-datahike.jar dump --data-dir /var/lib/dh --db prod \
                              --out prod.sql

Embedded library

;; deps.edn
{:deps {org.replikativ/datahike     {:mvn/version "LATEST"}
        org.replikativ/pg-datahike  {:mvn/version "LATEST"}}}

Native Datahike → SQL

A plain Clojure-created Datahike database is already a PG database:

(require '[datahike.api :as d]
         '[datahike.pg :as pg])

(def cfg {:store {:backend :memory :id (random-uuid)}
          :schema-flexibility :write})
(d/create-database cfg)
(def conn (d/connect cfg))

(d/transact conn
  [{:db/ident :person/id   :db/valueType :db.type/long
    :db/cardinality :db.cardinality/one :db/unique :db.unique/identity}
   {:db/ident :person/name :db/valueType :db.type/string
    :db/cardinality :db.cardinality/one}])

(d/transact conn [{:person/id 1 :person/name "Alice"}])

(pg/start-server conn {:port 5432})
;; psql postgresql://localhost:5432/datahike
;;   => SELECT * FROM person;
;;        id |  name
;;       ----+-------
;;         1 | Alice

SQL → Datahike

CREATE TABLE + INSERT over pgwire transact normal Datahike schemas and datoms. They're immediately queryable from Clojure:

;; Over any PG client:
;;   CREATE TABLE widget (sku TEXT PRIMARY KEY, weight INT);
;;   INSERT INTO widget VALUES ('A', 10), ('B', 20);

;; From Clojure:
(d/q '[:find ?sku ?w :where [?e :widget/sku ?sku] [?e :widget/weight ?w]]
     (d/db conn))
;; => #{["A" 10] ["B" 20]}

Four integration patterns

1. Multi-database server

One start-server call serves many Datahike connections; clients route on the JDBC URL's database name. Three modes for managing the registry:

Static — register from Clojure once

(pg/start-server {"prod"    prod-conn
                  "staging" staging-conn
                  "reports" reports-conn}
                 {:port 5432})
jdbc:postgresql://localhost:5432/prod      → prod-conn
jdbc:postgresql://localhost:5432/staging   → staging-conn
jdbc:postgresql://localhost:5432/nonsuch   → 3D000 invalid_catalog_name

SELECT current_database() returns the connected name; pg_database enumerates the registry.

Dynamic — mutate from Clojure at runtime

(def srv (pg/start-server {"main" main-conn} {:port 5432}))

(pg/add-database!    srv "extra"  extra-conn)   ; new client connections route
(pg/remove-database! srv "extra")               ; drop from the registry
(pg/databases        srv)                       ; #{"main"}

The Clojure-side and SQL-side mutation paths share the same atom, so either source is visible to both.

SQL-driven — clients self-provision via CREATE DATABASE

Set a :database-template (a partial Datahike config) and pgwire clients can use plain SQL to provision and tear down databases:

(pg/start-server {"datahike" boot-conn}
                 {:port 5432
                  :database-template {:store {:backend :memory}
                                      :schema-flexibility :write
                                      :keep-history? false}})
CREATE DATABASE myapp;                                    -- new memory store, fresh UUID id
CREATE DATABASE myapp2 WITH KEEP_HISTORY = true;          -- override per database
CREATE DATABASE histdb WITH (BACKEND = 'memory',          -- Yugabyte-style paren form also works
                              KEEP_HISTORY = true);
DROP DATABASE myapp;
DROP DATABASE IF EXISTS old_one;

Without a :database-template (or explicit :on-create-database / :on-delete-database hooks), CREATE/DROP DATABASE return SQLSTATE 0A000 feature_not_supported — provisioning from SQL is a deployment policy decision, not a default.

The accepted WITH clause maps option names case-insensitively to Datahike config:

WITH optionDatahike configNotes
BACKEND[:store :backend] keyword'memory', 'file', 'pg', 'redis',
STORE_ID[:store :id]Defaults to a fresh UUID per CREATE if omitted
PATH[:store :path]File backend; {{name}} interpolation supported
HOST / PORT / USER / PASSWORD / DBNAME[:store :*]pg / redis backends
SCHEMA_FLEXIBILITY:schema-flexibility keyword'read' or 'write'
KEEP_HISTORY:keep-history? boolean
INDEX:index keyword'persistent-set':datahike.index/persistent-set
OWNER / TEMPLATE / ENCODING / LC_COLLATE / LC_CTYPE / LOCALE / TABLESPACE / …PostgreSQL-only; silently accepted with a NOTICE for pg_dump round-trips

Unknown non-PG option names return SQLSTATE 42601 syntax_error. For full control bypassing the template, set the hooks directly:

(pg/start-server {"main" main-conn}
  {:port 5432
   :on-create-database (fn [name parsed-options]
                         ;; full Clojure freedom — derive config however
                         (let [cfg (operator-policy-for name parsed-options)]
                           (d/create-database cfg)
                           (d/connect cfg)))
   :on-delete-database (fn [name conn _parsed-options]
                         (d/release conn)
                         (d/delete-database (operator-config-for name)))})

2. Schema hints (native DBs)

:datahike.pg/* meta-attributes let you customize the SQL-side view of an existing Datahike schema without touching its structure:

(pg/set-hint! conn :person/full_name {:column "name"})      ; rename column
(pg/set-hint! conn :person/ssn       {:hidden true})        ; exclude from SQL
(pg/set-hint! conn :person/company   {:references :company/id}) ; FK target

After set-hint!, SELECT name FROM person works, ssn is invisible to SELECT * / information_schema.columns, and JOIN company c ON p.company = c.id resolves on Datahike's native ref semantics.

3. Temporal queries via SET

Datahike's time-travel primitives are exposed as session variables:

SET datahike.as_of   = '2024-01-15T00:00:00Z';  -- d/as-of
SET datahike.since   = '2024-01-01T00:00:00Z';  -- d/since
SET datahike.history = 'true';                  -- d/history
RESET datahike.as_of;

Transparent to the client — every subsequent query sees the chosen view of the db.

4. Git-like branching

Branching is cheap in Datahike: every transaction produces a new immutable commit, so a "branch" is just a named pointer at a commit UUID. Creation is O(1) — one konserve write, no data copy, no WAL replay, no container provisioning. pgwire exposes the read side and the admin ops through standard PG session variables and a small datahike.* function namespace.

-- Introspect
SELECT datahike.branches();            -- returns one row per branch
SELECT datahike.current_branch();
SELECT datahike.commit_id();           -- current session's db head
SELECT datahike.parent_commits();      -- immediate parents (0/1/2+)

-- Admin — konserve-level writes, don't go through the tx writer
SELECT datahike.create_branch('preview', 'main');
SELECT datahike.create_branch('from-cid', '69ea6ee1-…');  -- from commit UUID
SELECT datahike.delete_branch('preview');

-- Session view: three cuts on the same immutable log.
-- They compose — feature branch's state as of yesterday is just two SETs.
SET datahike.branch    = 'feature';
SET datahike.commit_id = '69ea6ee1-2feb-5b61-be14-5590b9e01e48';
SET datahike.as_of     = '2024-01-15T00:00:00Z';
RESET datahike.branch;

Or pin a branch at connect time via the JDBC URL's database name (same suffix pattern as the multi-DB registry):

jdbc:postgresql://localhost:5432/prod:feature   → prod-conn, pinned to :feature
jdbc:postgresql://localhost:5432/prod           → prod-conn, default branch

SET datahike.commit_id = '<uuid>' is Datahike-unique: no other PG-compatible database lets you query by an exact commit identifier.

Honest footnote for 0.1: writes always land on the connection's default branch, even when SET datahike.branch is active — pgwire reads respect the pinned branch, but the transaction writer is conn-wide. Users who need to write to a specific branch open a second connection on /<db>:<branch> or use the Clojure API (datahike.versioning/branch!, merge!, …). Post-0.1 item.

Migration & pg_dump interop

pg-datahike round-trips with real PostgreSQL via pg_dump SQL on both sides. Three concrete workflows:

Real PostgreSQL → pg-datahike (load a pg_dump file)

pg_dump output replays straight into pg-datahike via psql or any JDBC client. Schema-side coverage includes CREATE TABLE with FK constraints, CREATE SEQUENCE, DEFAULT nextval(...), CREATE TYPE … AS ENUM, CREATE DOMAIN, partitioned tables (parent + children). Data-side coverage includes INSERT (single + multi-VALUES) and COPY … FROM stdin (text and CSV formats). Run with the :pg-dump compat preset to silently accept the rest of pg_dump's noise (triggers, functions, materialized views, ALTER OWNER):

(pg/start-server registry {:port 5432 :compat :pg-dump})
psql -h localhost -p 5432 -U datahike -d datahike -f my_pg_dump.sql

Validated end-to-end against:

  • Chinook (15.6 k rows / 11 tables / FKs / NUMERIC / TIMESTAMP) — full bidirectional roundtrip, byte-identical per-row equality.
  • Pagila (50 k rows / 22 tables / ENUM / DOMAIN / partitioning / triggers / functions) — schema parses end-to-end, data loads in ~12 s (≈ 4 k rows/s).

Replay throughput depends on how the client sends the inserts. The fast paths are multi-statement Simple Query (what psql -f does), JDBC PreparedStatement.executeBatch, and explicit transactions — all of which commit batched rows through one underlying transact instead of per-row. Each-INSERT-its-own-roundtrip clients are bound by per-call commit cost (~370 r/s).

pg-datahike → portable PG SQL (the dump tool)

datahike.pg.dump/dump walks a Datahike database and emits pg_dump- shaped SQL. The output replays into either pg-datahike or real PostgreSQL via psql:

(require '[datahike.pg.dump :as dump])

(spit "out.sql" (dump/dump-to-string conn))
;; or stream:
(doseq [stmt (dump/dump conn)] (println stmt))

Or via the standalone uberjar's dump subcommand:

java -jar pg-datahike.jar dump --data-dir DIR --db NAME --out out.sql
java -jar pg-datahike.jar dump --config datahike-config.edn --copy

Options: --inserts (default) / --copy, --schema-only / --data-only, --exclude-table NAME. The --config flag accepts a full Datahike config EDN — works for any konserve backend (file, jdbc, s3, redis, lmdb, …); store-id is auto-discovered from the persisted :db branch so you don't need it up front.

Native Datahike databases dump too — no setup required

A plain Datahike database (created via d/transact, never touched by SQL) dumps as clean PG SQL out of the box:

  • :db.unique/identityPRIMARY KEY NOT NULL
  • :db.unique/valueUNIQUE
  • :db.cardinality/many TT[] with PG array literals ('{a,b}')
  • :db.type/refbigint (the entity-id)

For FK constraints in the dump, opt in with set-hint! on ref attrs:

(pg/set-hint! conn :order/customer {:references :customer/id})
;; → ALTER TABLE "order" ADD CONSTRAINT … FOREIGN KEY ("customer")
;;     REFERENCES "customer"("id");

This makes the "evaluate Datahike, fall back to real PG cleanly" story concrete: a native Datahike database can be exported to a PostgreSQL-shaped SQL file at any time, replayed on real PG, and queried there. No schema rewrite, no second source of truth.

Embedding without TCP

Bypass the wire layer for tests or in-process applications:

(def h (pg/make-query-handler conn))
(.execute h "CREATE TABLE person (id INT PRIMARY KEY, name TEXT)")
(.execute h "INSERT INTO person VALUES (1, 'Alice')")
(.execute h "SELECT * FROM person")

Compat with strict vs. permissive clients

By default the handler rejects unsupported DDL (GRANT, REVOKE, POLICY, ROW LEVEL SECURITY, CREATE EXTENSION, triggers, functions, …) with SQLSTATE 0A000 feature_not_supported. Real-world clients emit a lot of this noise unconditionally — opt in by named preset:

;; ORM noise: GRANT/REVOKE/POLICY/RLS/EXTENSION (Hibernate, Odoo, Rails)
(pg/make-query-handler conn {:compat :permissive})

;; pg_dump output: superset of :permissive that also accepts
;; CREATE TRIGGER / FUNCTION / PROCEDURE / AGGREGATE / RULE,
;; MATERIALIZED VIEW, ATTACH PARTITION, ALTER TYPE/DOMAIN, etc.
;; Use this when loading dump files from real PostgreSQL.
(pg/make-query-handler conn {:compat :pg-dump})

;; or accept specific kinds only
(pg/make-query-handler conn {:silently-accept #{:grant :policy}})

See datahike.pg.server/compat-presets for the preset bundles. The underlying :silently-accept set is also exposed on the standalone server: (pg/start-server … {:compat :pg-dump}).

Extending the catalog

Libraries can expose app-level metadata (audit tables, health checks, per-tenant views) as virtual catalog tables without forking this repo:

(pg/register-catalog-table!
 "app_metrics"
 {:schema [{:db/ident :app_metrics/key :db/valueType :db.type/string
            :db/cardinality :db.cardinality/one}
           {:db/ident :app_metrics/val :db/valueType :db.type/long
            :db/cardinality :db.cardinality/one}
           {:db/ident (datahike.pg.schema/row-marker-attr "app_metrics")
            :db/valueType :db.type/boolean :db/cardinality :db.cardinality/one}]
  :data-fn (fn [_user-schema _cte-db] (current-metric-rows))})

Then queryable as a regular table: SELECT * FROM app_metrics.

Compatibility

Tested against:

  • pgjdbc 42.7.x — ResultSetTest 80/80
  • psql / libpq
  • psycopg2 (Python)
  • Hibernate / Spring (via pgjdbc)
  • Rails ActiveRecord / Ecto (via pg gems)
  • Odoo 17 / 19 boot + module loading + TestORM suite
  • Flyway / Alembic migrations (via advisory locks)
  • Chinook + Pagila pg_dump fixtures load end-to-end (see Migration & pg_dump interop)

Modeled / first-class

CREATE SEQUENCE, nextval / currval / setval (with PG-correct non-transactional semantics — advance survives rollback, concurrent callers get distinct values), DEFAULT nextval('s'::regclass), CREATE TYPE … AS ENUM (membership enforced on INSERT — non-members raise 22P02), CREATE DOMAIN … [NOT NULL] [CHECK (…)] (CHECK predicate evaluated on INSERT — violations raise 23514, NOT NULL raises 23502; VALUE keyword resolves to the column value), text[] arrays, tsvector (opaque round-trip), bytea, timestamp with time zone, CHARACTER(N), SERIAL, COPY … FROM stdin (text + CSV), ALTER TABLE … ADD CONSTRAINT FOREIGN KEY … ON UPDATE CASCADE ON DELETE RESTRICT.

Accepted under :compat :pg-dump (no semantics, schema loads)

CREATE TRIGGER, CREATE FUNCTION (incl. $$…$$ body), CREATE PROCEDURE, CREATE AGGREGATE, CREATE RULE, CREATE OPERATOR, CREATE CAST, CREATE LANGUAGE, CREATE MATERIALIZED VIEW, ALTER TYPE / DOMAIN, ALTER TABLE … ATTACH PARTITION / DETACH PARTITION, ALTER … OWNER TO, \restrict/\unrestrict psql metacommands, pg_catalog.set_config(...). Partitioned tables load: the parent's PARTITION BY clause is stripped, partition children load as independent tables, and pg_dump's per-child INSERTs land where they should.

Known gaps (by design or deferred)

  • No PL/pgSQL execution (CREATE FUNCTION's body is silently accepted but never invoked).
  • No LISTEN / NOTIFY.
  • FK ON DELETE CASCADE / SET NULL / SET DEFAULT rejected at DDL (RESTRICT is the default and only supported action).
  • Single public schema — CREATE SCHEMA silently accepted but no-op.
  • Cursor materialization is eager (entire result set held in memory).
  • No deferrable constraints.
  • Generated columns parse but aren't enforced.
  • ENUM ordering is lexicographic (string storage) rather than PG's declaration-order. ORDER BY on enum-typed columns may diverge.
  • Partitioned tables: data lives in the children, not the parent (pg- datahike doesn't model the partition relation). Live workloads that need partition-aware routing into the parent table are out of scope.
  • Constraint enforcement is one-directional in 0.1. SQL constraints declared via DDL — NOT NULL, CHECK, UNIQUE, foreign-key child-side and parent-side RESTRICT, DOMAIN CHECK / NOT NULL, ENUM membership — are enforced by the pgwire handler at write time (via a :db.fn/call wrapper that runs against the speculative txdb, mirroring PG IMMEDIATE constraint semantics). Direct (d/transact) writes from Clojure bypass these checks because Datahike's schema does not yet carry the corresponding constraint vocabulary. Use the SQL path for constrained inserts, or validate explicitly before transacting. A future release will lift enforcement into Datahike's tx layer so both paths are gated.

Development

# One-time: compile Java sources.
bb prep

# Run unit tests (requires Java classes built):
bb test

# SQL-conformance suite:
bb sqllogictest

# Format / lint:
bb format           # check
clojure -M:ffix     # auto-fix
bb lint

# Start a dev server on :15432 + nREPL on :15433 for dual access:
clojure -A:test -M test/integration/start_pgwire.clj

# pgjdbc conformance:
test/integration/pgjdbc/setup.sh    # once, clones pgjdbc@REL42.7.5
test/integration/pgjdbc/run.sh      # ~5 min

See doc/integration-testing.md for the full three-layer testing model and ORM-harness setup.

Architecture

SQL → sql.classify   (tokenize + structural routing)
    → sql.shape      (structural matcher for catalog-probe SELECTs)
    → sql.rewrite    (token-driven source rewrites before JSqlParser)
    → JSqlParser
    → sql.*          (translate AST → Datalog / tx-data)
    → server         (handler dispatch, wire protocol)
    → Datahike

Module inventory (src/datahike/pg/):

ModuleRole
sqlTop-level dispatch: preprocess + parse-sql + result-type routing.
sql.classifyToken-based routing for SET, SAVEPOINT, system fns, etc.
sql.rewriteSpan-based source rewriter for SQL shapes JSqlParser rejects
sql.shapeStructural SELECT probe matcher for catalog queries
sql.{ddl,stmt,expr,ctx,catalog,fns,params,coerce,oid_infer}AST → Datalog / tx-data translation
sql.templateLexical INSERT-VALUES templater (literal → ? rewrite + typed substitute) for parse-result reuse on bulk inserts
sql.databaseCREATE/DROP DATABASE token-parser + provisioning helpers
sql.typesCREATE TYPE … AS ENUM / CREATE DOMAIN parsers (bypass JSqlParser); registry-entity model
sql.copy / sql.copy.{text_format,csv_format}COPY … FROM stdin parser + format decoders
schemaVirtual-table derivation + :datahike.pg/* hint support
serverHandler reify, wire dispatch, constraint enforcement
dumpReverse direction: walk schema + data, emit pg_dump-shaped SQL
mainStandalone CLI (serve / dump subcommands)
errorsException → SQLSTATE classification at the wire boundary
types / arrays / jsonb / windowRuntime value model + post-processing

License

Copyright © 2026 Christian Weilbach and contributors.

Released under the Eclipse Public License 2.0. See LICENSE.

Can you improve this documentation?Edit on GitHub

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts
Ctrl+kJump to recent docs
Move to previous article
Move to next article
Ctrl+/Jump to the search field
× close