Liking cljdoc? Tell your friends :D

Datalevin

🧘 Simple, fast and durable Datalog database for everyone 💽

:hear_no_evil: What and why

I love Datalog, why hasn't everyone used this already?

Datalevin is a simple durable Datalog database.

The rationale is to have a simple, fast and free Datalog query engine running on durable storage. It is our observation that many developers prefer the flavor of Datalog popularized by Datomic® over any flavor of SQL, once they get to use it. Perhaps it is because Datalog is more declarative and composable than SQL, e.g. the automatic implicit joins seem to be its killer feature.

Datomic® is an enterprise grade software, and its feature set may be an overkill for some use cases. One thing that may confuse casual users is its temporal features. To keep things simple and familiar, Datalevin does not store transaction history, and behaves the same way as most other databases: when data are deleted, they are gone.

Datalevin started out as a port of Datascript in-memory Datalog database to Lightning Memory-Mapped Database (LMDB). It retains the library property of Datascript, and it is meant to be embedded in applications to manage state. Because data is persistent on disk in Datalevin, application state can survive application restarts, and data size can be larger than memory.

Datalevin relies on the robust ACID transactional database features of LMDB. Designed for concurrent read intensive workloads, LMDB is used in many projects, e.g. Cloudflare global configuration distribution. LMDB also performs well in writing large values (> 2KB). Therefore, it is fine to store documents in Datalevin.

Datalevin uses cover index and has no write-ahead log, so once the data are written, they are indexed. There are no separate processes or threads for indexing, compaction or doing any database maintenance work that compete with your applications for resources. Since Datalog is simply a more ergonomic query language than SQL, Datalevin can serves the role of an easier-to-use and more lightweight relational database (RDBMS), e.g. where SQLite is called for.

Independent from Datalog, Datalevin can be used as a fast key-value store for EDN data, with support for range queries, predicate filtering and more. The native EDN data capability of Datalevin should be beneficial for Clojure programs. One can use this feature in situations where something like Redis is called for, for instance.

Our goal is to simplify data storage and access by supporting diverse use cases and paradigms, because maximal flexibility is the core strength of a Datalog store. Datalevin may not be the fastest or the most scalable solution for one particular use case, but it would surely support the most number of them in a coherent and elegant manner.

Using one data store for different use cases simplifies and reduces the cost of software development, deployment and maintenance. Therefore, we plan to implement necessary extensions to make Datalevin also a search engine, a production rule engine, a graph database, and a document database, since the storage and index structure of Datalevin is already compatible with all of them.

Presentation:

2020 London Clojurians Meetup

:truck: Installation

Clojure library

Datalevin is a Clojure library, simply add it to your project as a dependency and start using it!

If you use Clojure CLI and deps.edn, declare the dependency on Datalevin:

{:deps
 {datalevin/datalevin {:mvn/version "0.4.14"}}}

If you use Leiningen build tool, add this to the :dependencies section of your project.clj file:

[datalevin "0.4.14"]

Native image and command line tool

Datalevin supports compilation into a GraalVM native image, which should have better performance, for the native image version does not incur JNI overhead and uses a comparator written in C, see blog post.

The release contains a command line tool called dtlv that is built as a native image. It can be used to work with Datalevin databases in shell scripting, e.g. database backup/compaction, data import/export, query/transaction execution, and so on.

Download the pre-built binary for amd64 platform (for now, you need to install LMDB first: brew install lmdb on MacOS, sudo apt install liblmdb-dev on Ubuntu/Debian, until #38 is resolved):

Unzip, put it on your path, and execute dtlv help:

  Datalevin (version: 0.4.14)

Usage: dtlv [options] [command] [arguments]

Commands:
  copy  Copy a database, regardless of whether it is now in use
  drop  Drop or clear a database
  dump  Dump the content of a database to standard output
  exec  Execute database transactions or queries
  help  Show help messages
  load  Load data from standard input into a database
  repl  Enter an interactive shell
  stat  Display statistics of database

Options:
  -a, --all        Include all of the sub-databases
  -c, --compact    Compact while copying.
  -d, --dir PATH   Path to the database directory
  -D, --delete     Delete the sub-database, not just empty it
  -f, --file PATH  Path to the specified file
  -g, --datalog    Dump/load as a Datalog database
  -h, --help       Show usage
  -l, --list       List the names of sub-databases instead of the content
  -V, --version    Show Datalevin version and exit

Type 'dtlv help <command>' to read about a specific command.

Launch dtlv in rlwrap to get a better REPL experience, i.e. rlwrap dtlv.

If your application depends on Datalevin and want to compile to GraalVM native image, read this note.

:tada: Library Usage

Use as a Datalog store

(require '[datalevin.core :as d])

;; Define an optional schema.
;; Note that pre-defined schema is optional, as Datalevin does schema-on-write.
;; However, attributes requiring special handling need to be defined in schema,
;; e.g. many cardinality, uniqueness constraint, reference type, and so on.
(def schema {:aka  {:db/cardinality :db.cardinality/many}
             ;; :db/valueType is optional, if unspecified, the attribute will be
             ;; treated as EDN blobs, and may not be optimal for range queries
             :name {:db/valueType :db.type/string
                    :db/unique    :db.unique/identity}})

;; Create DB on disk and connect to it, assume write permission to create given dir
(def conn (d/get-conn "/data/datalevin/mydb" schema))

;; Transact some data
;; Notice that :nation is not defined in schema, so it will be treated as an EDN blob
(d/transact! conn
             [{:name "Frege", :db/id -1, :nation "France", :aka ["foo" "fred"]}
              {:name "Peirce", :db/id -2, :nation "france"}
              {:name "De Morgan", :db/id -3, :nation "English"}])

;; Query the data
(d/q '[:find ?nation
       :in $ ?alias
       :where
       [?e :aka ?alias]
       [?e :nation ?nation]]
     @conn
     "fred")
;; => #{["France"]}

;; Retract the name attribute of an entity
(d/transact! conn [[:db/retract 1 :name "Frege"]])

;; Pull the entity, now the name is gone
(d/q '[:find (pull ?e [*])
       :in $ ?alias
       :where
       [?e :aka ?alias]]
     @conn
     "fred")
;; => ([{:db/id 1, :aka ["foo" "fred"], :nation "France"}])

;; Close DB connection
(d/close conn)

Use as a key value store

(require '[datalevin.core :as d])
(import '[java.util Date])

;; Open a key value DB on disk and get the DB handle
(def db (d/open-kv "/data/datalevin/mykvdb"))

;; Define some table (called "dbi" in LMDB) names
(def misc-table "misc-test-table")
(def date-table "date-test-table")

;; Open the tables
(d/open-dbi db misc-table)
(d/open-dbi db date-table)

;; Transact some data, a transaction can put data into multiple tables
;; Optionally, data type can be specified to help with range query
(d/transact-kv
  db
  [[:put misc-table :datalevin "Hello, world!"]
   [:put misc-table 42 {:saying "So Long, and thanks for all the fish"
                       :source "The Hitchhiker's Guide to the Galaxy"}]
   [:put date-table #inst "1991-12-25" "USSR broke apart" :instant]
   [:put date-table #inst "1989-11-09" "The fall of the Berlin Wall" :instant]])

;; Get the value with the key
(d/get-value db misc-table :datalevin)
;; => "Hello, world!"
(d/get-value db misc-table 42)
;; => {:saying "So Long, and thanks for all the fish",
;;     :source "The Hitchhiker's Guide to the Galaxy"}

;; Delete some data
(d/transact-kv db [[:del misc-table 42]])

;; Now it's gone
(d/get-value db misc-table 42)
;; => nil

;; Range query, from unix epoch time to now
(d/get-range db date-table [:closed (Date. 0) (Date.)] :instant)
;; => [[#inst "1989-11-09T00:00:00.000-00:00" "The fall of the Berlin Wall"]
;;     [#inst "1991-12-25T00:00:00.000-00:00" "USSR broke apart"]]

;; Close key value db
(d/close-kv db)

:green_book: Documentation

Please refer to the API documentation for more details.

:rocket: Status

Both Datascript and LMDB are mature and stable libraries. Building on top of them, Datalevin is extensively tested with property-based testing. It is also used in production at Juji.

Running the benchmark suite adopted from Datascript on a Ubuntu Linux server with an Intel i7 3.6GHz CPU and a 1TB SSD drive, here is how it looks.

query benchmark write benchmark

In all benchmarked queries, Datalevin is faster than Datascript. Considering that we are comparing a disk store with a memory store, this result may be counter-intuitive. One reason is that Datalevin caches more aggressively, whereas Datascript chose not to do so (e.g. see this issue). Before we introduced caching in version 0.2.8, Datalevin was only faster than Datascript for single clause queries due to the highly efficient reads of LMDB. With caching enabled, Datalevin is now faster across the board. In addition, we will soon move to a more efficient query implementation.

Writes are slower than Datascript, as expected, as Datalevin is writing to disk while Datascript is in memory. The bulk write speed is good, writing 100K datoms to disk in less than 0.5 seconds; the same data can also be transacted with all the integrity checks as a whole in less than 2 seconds. Transacting one datom or five datoms at a time, it takes more or less than that time.

In short, Datalevin is quite capable for small or medium projects right now. Large scale projects can be supported when distributed mode is implemented.

:earth_americas: Roadmap

These are the tentative goals that we try to reach as soon as we can. We may adjust the priorities based on feedback.

0.4.0 ~~Native image and native command line tool.~~ [Done 2021/02/27]
0.5.0 A new Datalog query engine with improved performance.
0.6.0 Datalog query parity with Datascript: composite tuples and persisted transaction functions.
0.7.0 Fully automatic schema migration on write.
0.8.0 As a product rule engine: implementing Rete/UL algorithm.
0.9.0 As a search engine: fuzzy fulltext search across multiple attributes.
0.10.0 As a graph database: implementing loom graph protocols.
0.11.0 As a document database: auto indexing of document fields.
1.0.0 Distributed mode with raft based replication.

We appreciate and welcome your contribution or suggestion. Please file issues or pull requests.

:floppy_disk: Differences from Datascript

Datascript is developed by Nikita Prokopov that "is built totally from scratch and is not related by any means to" Datomic®. Although currently a port, Datalevin differs from Datascript in more significant ways than just the difference in data durability:

As mentioned, Datalevin is not an immutable database, and there is no "database as a value" feature. Since history is not kept, transaction ids are not stored.
Datoms in a transaction are committed together as a batch, rather than being saved by with-datom one at a time.
Respects :db/valueType. Currently, most Datomic® value types are supported, except bigint, bigdec, uri and tuple. Values of the attributes that are not defined in the schema or have unspecified types are treated as EDN blobs, and are de/serialized with nippy.
Has a value leading index (VEA) for datoms with :db.type/ref type attribute; The attribute and value leading index (AVE) is enabled for all datoms, so there is no need to specify :db/index, similar to Datomic® Cloud. Does not have AEV index, in order to save storage and improve write speed.
Attributes are stored in indices as integer ids, thus attributes in index access are returned in attribute creation order, not in lexicographic order (i.e. do not expect :b to come after :a). This is the same as Datomic®.
Has no features that are applicable only for in-memory DBs, such as DB as an immutable data structure, DB pretty print, etc.

This project would not have started without the existence of Datascript, we will continue submitting pull requests to Datascript with our improvements where they are applicable to Datascript.

:baby: Limitations

Attribute names have a length limitation: an attribute name cannot be more than 511 bytes long, due to LMDB key size limit.
Because keys are compared bitwise, for range queries to work as expected on an attribute, its :db/valueType should be specified.
Floating point NaN cannot be stored.
The maximum individual value size is 4GB. In practice, value size is determined by LMDB's ability to find large enough continuous space on disk and Datelevin's ability to pre-allocate off-heap buffers in JVM for them.
The total data size of a Datalevin database has the same limit as LMDB's, e.g. 128TB on a modern 64-bit machine that implements 48-bit address spaces.
There's no network interface as of now, but this may change.
Currently only supports Clojure on JVM, but adding support for other Clojure-hosting runtime is possible, since bindings for LMDB exist in almost all major languages and available on most platforms. However, I would prefer other languages to work with GraalVM compiled native library when it is available, so I only need to maintain one language binding.

:shopping: Alternatives

If you are interested in using the dialect of Datalog pioneered by Datomic®, here are your current options:

If you need time travel and rich features backed by the authors of Clojure, you should use Datomic®.
If you need an in-memory store that has almost the same API as Datomic®, Datascript is for you.
If you need an in-memory graph database, Asami is fast.
If you need features such as bi-temporal graph queries, you may try Crux.
If you need a durable store with some storage choices, you may try Datahike.
There was also Eva, a distributed store, but it is no longer in active development.
If you need a simple and fast durable store with a battle tested backend, give Datalevin a try.

Version: 0.4.14

License

Licensed under Eclipse Public License (see LICENSE).

Can you improve this documentation? These fine people already did:
Huahai Yang & rubinovitzEdit on GitHub

cljdoc is a website building & hosting documentation for Clojure/Script libraries

Keyboard shortcuts Report a problem cljdoc on GitHub

× close