This document covers Stratum's dataset type, persistence model, and temporal query capabilities.
StratumDataset is the primary table abstraction in Stratum. It wraps a collection of typed columns with schema, metadata, and persistence support.
(require '[stratum.api :as st]
'[stratum.dataset :as dataset]
'[stratum.index :as idx])
;; From raw arrays (query-only, no persistence)
(def ds (st/make-dataset
{:price (double-array [10.0 20.0 30.0])
:qty (long-array [1 2 3])}
{:name "trades"}))
;; From indices (supports persistence, zone maps, O(1) fork)
(def ds (st/make-dataset
{:price (idx/index-from-seq :float64 [10.0 20.0 30.0])
:qty (idx/index-from-seq :int64 [1 2 3])}
{:name "trades"}))
Options for make-dataset:
:name - Dataset name string (default: "unnamed"):metadata - User metadata map (stored in commits)| Input Type | Internal Format | Persistence | Zone Maps |
|---|---|---|---|
long[] | {:type :int64 :data long[]} | No | No |
double[] | {:type :float64 :data double[]} | No | No |
String[] | Dict-encoded {:type :int64 :data long[] :dict String[]} | No | No |
PersistentColumnIndex | {:type T :source :index :index idx} | Yes | Yes |
Only index-backed columns support persistence (st/sync!) and O(1) forking (st/fork).
(st/name ds) ;; => "trades"
(st/row-count ds) ;; => 3
(st/column-names ds) ;; => (:price :qty)
(st/schema ds) ;; => {:price {:type :float64 :nullable? true} ...}
(st/columns ds) ;; => normalized column map for query engine
;; Add column (returns new dataset, validates length)
(assoc ds :revenue (double-array [100.0 200.0 300.0]))
;; Remove column
(dissoc ds :old-col)
;; Rename column
(dataset/ds-rename-column ds :old-name :new-name)
Like Clojure collections, mutations require transient mode:
;; CORRECT - transient → mutate → persistent
(-> ds
dataset/ds-transient
(dataset/ds-set! :price 0 99.0)
(dataset/ds-append! {:price 40.0 :qty 4})
dataset/ds-persistent!)
;; WRONG - will throw IllegalStateException
(dataset/ds-set! ds :price 0 99.0)
Only index-backed datasets support transient mode.
Forking is O(1) via structural sharing:
(def fork (st/fork ds))
;; fork shares all data with ds
;; mutations to fork's transient only copy affected chunks (CoW)
(require '[konserve.store :as kstore])
;; Create store
(def store (kstore/create-store {:backend :file :path "/tmp/stratum-data"
:id (java.util.UUID/randomUUID)}
{:sync? true}))
;; Save to branch
(def saved (st/sync! ds store "main"))
;; Load from branch HEAD
(def loaded (st/load store "main"))
;; Load from specific commit
(def at-commit (st/load store commit-uuid))
;; List branches
(stratum.storage/list-dataset-branches store)
;; => #{"main" "feature-1"}
;; Delete branch (data reclaimed by GC)
(dataset/ds-delete-branch! store "feature-1")
;; Garbage collect unreachable data
(st/gc! store)
Stratum uses konserve with the following key structure:
| Key | Contents |
|---|---|
[:datasets :branches] | Set of branch names |
[:datasets :heads "main"] | Branch HEAD commit UUID |
[:datasets :commits <uuid>] | Dataset snapshot (columns, schema, metadata) |
[:indices :commits <uuid>] | Index snapshot (PSS root, chunk size, stats) |
<uuid> | PSS tree node (Leaf or Branch) |
Commits store user metadata from the dataset. Use st/with-metadata to add per-commit info:
(-> ds
(st/with-metadata {"datahike/tx" 42
"synced-at" (System/currentTimeMillis)})
(st/sync! store "main"))
Stratum supports querying datasets at specific points in time.
(st/q {:from "trades" :agg [[:sum :price]]}
{:store store :as-of commit-uuid})
(st/q {:from "trades" :agg [[:sum :price]]}
{:store store :branch "main"})
When syncing with Datahike tx metadata, query at a specific transaction point:
(st/q {:from "trades" :agg [[:sum :price]]}
{:store store :as-of-tx 42})
This walks the commit history and finds the most recent commit with "datahike/tx" metadata <= 42.
;; Resolve without querying
(def ds-at-time (st/resolve store "trades" {:as-of commit-uuid}))
(def ds-at-tx (st/resolve store "trades" {:as-of-tx 42}))
Datasets work directly as :from sources:
;; Dataset as :from (preferred)
(st/q {:from my-dataset
:where [[:> :price 100]]
:agg [[:sum :price] [:count]]
:group [:region]})
;; Index-backed datasets get zone map pruning automatically
The query engine extracts normalized columns via st/columns and routes to the optimal execution strategy.
Can you improve this documentation?Edit on GitHub
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |