Readme — io.replikativ/datahike 0.7.1617

Liking cljdoc? Tell your friends :D

Branch databases, not just code.

Datahike is a durable Datalog database with Datomic-compatible APIs and git-like semantics. Built on persistent data structures and structural sharing, database snapshots are immutable values that can be held, shared, and queried anywhere—without locks or copying.

Key capabilities:

🌐 Distributed Index Space: Read scaling without database connections—readers access persistent indices directly
🗄️ Flexible storage: File, LMDB, S3, JDBC, Redis, IndexedDB via konserve—choose what fits
🌍 Cross-platform: JVM, Node.js, Browser (Clojure, ClojureScript, JavaScript, Java APIs)
⚡ Real-time sync: WebSocket streaming with Kabel for browser ↔ server
🕰️ Time-travel: Query any historical state, full transaction audit trail (versioning API becoming stable)
🔒 GDPR-ready: Complete data excision for regulatory compliance
🚀 Production-proven: Tested with billions of datoms, deployed in government services

Distributed by design: Datahike is part of the replikativ ecosystem for decentralized data architectures.

Why Datalog?

Modern applications model increasingly complex relationships—social networks, organizational hierarchies, supply chains, knowledge graphs. Traditional SQL forces you to express graph queries through explicit joins, accumulating complexity as relationships grow. Datalog uses pattern matching over relationships: describe what you're looking for, not how to join tables.

As systems evolve, SQL schemas accumulate join complexity. What starts as simple tables becomes nested subqueries and ad-hoc graph features. Datalog treats relationships as first-class: transitive queries, recursive rules, and multi-database joins are natural to express. The result is maintainable queries that scale with relationship complexity. See Why Datalog? for detailed comparisons.

Time is fundamental to information: Most value derives from how facts evolve over time. Datahike's immutable design treats the database as an append-only log of facts—queryable at any point in history, enabling audit trails, debugging through time-travel, and GDPR-compliant data excision. Immutability also powers Distributed Index Space: database snapshots are values that can be shared, cached, and queried without locks.

You can find API documentation on cljdoc and articles on Datahike on our company's blog page.

We presented Datahike also at meetups,for example at:

Usage

Add to your dependencies:

We provide a small stable API for the JVM at the moment, but the on-disk schema is not fixed yet. We will provide a migration guide until we have reached a stable on-disk schema. Take a look at the ChangeLog before upgrading.

(require '[datahike.api :as d])


;; use the filesystem as storage medium
(def cfg {:store {:backend :file :path "/tmp/example"}})

;; create a database at this place, per default configuration we enforce a strict
;; schema and keep all historical data
(d/create-database cfg)

(def conn (d/connect cfg))

;; the first transaction will be the schema we are using
;; you may also add this within database creation by adding :initial-tx
;; to the configuration
(d/transact conn [{:db/ident :name
                   :db/valueType :db.type/string
                   :db/cardinality :db.cardinality/one }
                  {:db/ident :age
                   :db/valueType :db.type/long
                   :db/cardinality :db.cardinality/one }])

;; lets add some data and wait for the transaction
(d/transact conn [{:name  "Alice", :age   20 }
                  {:name  "Bob", :age   30 }
                  {:name  "Charlie", :age   40 }
                  {:age 15 }])

;; search the data
(d/q '[:find ?e ?n ?a
       :where
       [?e :name ?n]
       [?e :age ?a]]
  @conn)
;; => #{[3 "Alice" 20] [4 "Bob" 30] [5 "Charlie" 40]}

;; add new entity data using a hash map
(d/transact conn {:tx-data [{:db/id 3 :age 25}]})

;; if you want to work with queries like in
;; https://grishaev.me/en/datomic-query/,
;; you may use a hashmap
(d/q {:query '{:find [?e ?n ?a ]
               :where [[?e :name ?n]
                       [?e :age ?a]]}
      :args [@conn]})
;; => #{[5 "Charlie" 40] [4 "Bob" 30] [3 "Alice" 25]}

;; query the history of the data
(d/q '[:find ?a
       :where
       [?e :name "Alice"]
       [?e :age ?a]]
  (d/history @conn))
;; => #{[20] [25]}

;; you might need to release the connection for specific stores
(d/release conn)

;; clean up the database if it is not need any more
(d/delete-database cfg)

The API namespace provides compatibility to a subset of Datomic functionality and should work as a drop-in replacement on the JVM. The rest of Datahike will be ported to core.async to coordinate IO in a platform-neutral manner.

Refer to the docs for more information:

Why Datalog? - Query comparisons and when to use Datalog
Storage backends - choosing the right backend for your needs
Distributed architecture - Distributed Index Space and real-time sync
Versioning - git-like branching and merging (beta)
Norms - database migration system
Configuration
Schema flexibility
Time Variance - time-travel queries (as-of, history, since), audit trails, and GDPR-compliant purging
Garbage Collection - reclaim storage by removing old database snapshots
JavaScript API - Promise-based API for Node.js and browsers
CLI - native command-line tool (dthk)
Babashka pod - shell scripting with Datahike
libdatahike - C/C++ native library
Benchmarking
Differences to Datomic
Entity spec
Logging and error handling
Unstructured input support (experimental)
Backend development
Contributing to Datahike

For simple examples have a look at the projects in the examples folder.

Example Projects

Invoice creation demonstrated at the Dutch Clojure Meetup.

Relationship to Datomic and DataScript

Datahike provides similar functionality to Datomic and can be used as a drop-in replacement for a subset of it. The goal of Datahike is not to provide an open-source reimplementation of Datomic, but it is part of the replikativ toolbox aimed to build distributed data management solutions. We have spoken to many backend engineers and Clojure developers, who tried to stay away from Datomic just because of its proprietary nature and we think in this regard Datahike should make an approach to Datomic easier and vice-versa people who only want to use the goodness of Datalog in small scale applications should not worry about setting up and depending on Datomic.

Some differences are:

Datahike runs locally on one peer. A transactor might be provided in the future and can also be realized through any linearizing write mechanism, e.g. Apache Kafka. If you are interested, please contact us.
Datahike provides the database as a transparent value, i.e. you can directly access the index datastructures (hitchhiker-tree) and leverage their persistent nature for replication. These internals are not guaranteed to stay stable, but provide useful insight into what is going on and can be optimized.
Datahike supports GDPR compliance by allowing to completely remove database entries.
Datomic has a REST interface and a Java API
Datomic provides timeouts

Datomic is a full-fledged scalable database (as a service) built from the authors of Clojure and people with a lot of experience. If you need this kind of professional support, you should definitely stick to Datomic.

Datahike's query engine and most of its codebase come from DataScript. Without the work on DataScript, Datahike would not have been possible. Differences to Datomic with respect to the query engine are documented there.

When to Choose Datahike vs. Datomic vs. DataScript

Datahike

Pick Datahike when you need:

Distributed read scaling: Distributed Index Space enables massive read scalability without database connections
Cross-platform deployment: The only Datalog database running on JVM, Node.js, and browsers with the same API
Flexible storage: Switch backends (File, LMDB, S3, JDBC, Redis, IndexedDB) based on deployment needs
Open source control: Full access to study, modify, and deploy without vendor lock-in
Proven scale: Tested with billions of datoms in benchmarks and government production systems
Browser capabilities: Offline-first applications with IndexedDB and real-time WebSocket sync
Git-like workflows: Versioning, branching, and time-travel queries on your data
GDPR compliance: Complete data excision capabilities

Datahike is Datomic-API compatible, making migration straightforward if your needs change.

Datomic

Pick Datomic if you:

Want proven patterns: Extensive documentation, learning materials, and enterprise examples
Need maximum performance: Fastest query execution for complex joins
Free is sufficient: Binaries are free (Apache 2.0) though closed-source
AWS-committed: Planning Datomic Cloud deployment on AWS

Note: Datomic has no ClojureScript support and uses proprietary storage backends.

DataScript

Pick DataScript when you need:

Browser-only: No persistence needed, pure client-side state management
Maximum speed: In-memory queries with no I/O overhead
Lightweight: Minimal dependencies, "cheap as creating a Hashmap"
Ephemeral data: Application state that doesn't survive page reloads

DataScript is mature and battle-tested (Roam Research, Athens), but has no durable storage.

ClojureScript & JavaScript Support

Datahike has beta ClojureScript support for both Node.js (file backend) and browsers (IndexedDB with TieredStore for memory hierarchies).

JavaScript API (Promise-based):

const d = require('datahike');

const config = { store: { backend: ':mem', id: 'example' } };
await d.createDatabase(config);
const conn = await d.connect(config);

await d.transact(conn, {
  'tx-data': [{ ':name': 'Alice', ':age': 30 }]
});

const results = await d.q(
  '[:find ?name ?age :where [?e :name ?name] [?e :age ?age]]',
  d.db(conn)
);
// => [['Alice', 30]]

Browser with real-time sync: Combine IndexedDB storage with Kabel WebSocket middleware for offline-capable applications that sync to server when online.

See JavaScript API documentation for details.

npm package (preview):

npm install datahike@next

Native CLI tool (dthk): Compiled with GraalVM native-image for instant startup. Ships with file backend support, scriptable for quick queries and automation. Available in releases. See CLI documentation.

Babashka pod: Native-compiled pod available in the Babashka pod registry for shell scripting. See Babashka pod documentation.

Java API (libdatahike): C++ bindings enable embedding Datahike in non-JVM applications. Experimental Python bindings available: pydatahike. See libdatahike documentation.

Production Use

Swedish Public Employment Service

The Swedish Public Employment Service (Arbetsförmedlingen) has been using Datahike in production since 2024 to store and serve the Labour Market Taxonomy (Arbetsmarknadstaxonomin). This is a terminology consisting of more than 40,000 labour market concepts, primarily representing occupations and skills, used to encode labour market data both within Arbetsförmedlingen and externally.

Key facts:

Scale: 40,000+ concepts with complex relationships
Usage: Thousands of case workers access the taxonomy API daily across Sweden
Versioning: Transaction history provides full audit trail for regulatory compliance
Updates: Continuously maintained to reflect current labour market
Open source: API source code and benchmark suite are publicly available

Benchmarks: The Swedish government published performance benchmarks comparing Datahike to Datomic across a range of complex queries representative of real-world government workloads.

Proximum: Vector Search for Datahike

Coming soon: Proximum is a high-performance HNSW vector index designed for Datahike's persistent data model. It brings semantic search and RAG capabilities to Datahike while maintaining immutability and full audit history.

Key features (upcoming):

Fast HNSW (Hierarchical Navigable Small World) vector search
Immutable index snapshots—same git-like semantics as Datahike
Persistent data structures without mutation or locks
Dual-licensed: EPL-2.0 (open source) and commercial license

See datahike.io/proximum for details. Publication pending completion of current work.

Composable Ecosystem

Datahike is compositional by design—built from independent, reusable libraries that work together but can be used separately in your own systems. Each component is open source and maintained as part of the replikativ project.

Core libraries:

konserve: Pluggable key-value store abstraction with backends for File, LMDB, S3, JDBC, Redis, IndexedDB, and more. Use it for any persistent storage needs beyond Datahike.
kabel: WebSocket transport with middleware support. Build real-time communication layers for any application.
hasch: Content-addressable hashing for Clojure data structures. Create immutable references to data.
incognito: Extensible serialization for custom types. Serialize any Clojure data across networks or storage.
superv.async: Supervision and error handling for core.async. Build robust asynchronous systems.

Advanced:

replikativ: CRDT-based data synchronization for eventually consistent systems. Build collaborative applications with automatic conflict resolution.
distributed-scope: Remote function invocation with Clojure semantics across processes.

This modularity enables custom solutions across languages and runtimes: embed konserve in Python applications, use kabel for non-database real-time systems, or build entirely new databases on the same storage layer. Datahike demonstrates how these components work together, but you're not locked into our choices.

Roadmap and Participation

Instead of providing a static roadmap, we work closely with the community to decide what will be worked on next in a dynamic and interactive way.

How it works:

Go to GitHub Discussions and upvote the ideas you'd like to see in Datahike. When we have capacity for a new feature, we address the most upvoted items.

You can also propose ideas yourself—either by adding them to Discussions or by creating a pull request. Note that due to backward compatibility considerations, some PRs may take time to integrate.

Commercial Support

We are happy to provide commercial support. If you are interested in a particular feature, please contact us at contact@datahike.io.

License

Licensed under Eclipse Public License (see LICENSE).

Can you improve this documentation? These fine people already did:
Nikita Prokopov, Konrad Kühne, Christian Weilbach, Timo Kramer, Judith Massa, Judith, Anders Hovmöller, Rune Juhl Jacobsen, Yee Fay Lim, David Whittington, Tyler Pirtle, Ryan Sundberg, Robert Stuttaford, Francesco Sardo, zachcp, Coby Tamayo, jonasseglare, Nuttanart Pornprasitsakul, Mike Ivanov, Denis Krivosheev, Linus Ericsson, Matthias Nehlsen, Alejandro Gomez, Thomas Schranz, Vlad & JCEdit on GitHub

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts

`Ctrl`+`k`	Jump to recent docs
`←`	Move to previous article
`→`	Move to next article
`Ctrl`+`/`	Jump to the search field

Raise an issue Browse cljdoc source Chat on Slack

× close