Stratum stores configured with :crypto-hash? true produce content-addressed
commits whose UUIDs are deterministic hashes of the payload. Any bytes-level
tampering on the underlying konserve blobs surfaces as a recomputed UUID that
no longer matches the address it's stored under. stratum.audit exposes the
verification surface that turns this property into actionable reports.
The audit namespace exposes the IAuditable protocol
(-merkle-root, -recompute-merkle-root) and a small
result-map vocabulary (:ok, :mismatch, :unsupported, :advisory,
:incomplete). Adapters that embed stratum inside a larger storage
substrate (e.g. as a Datalog secondary index) can implement
IAuditable on their wrapper types to compose stratum verification
with other engines' verification in one pass; the result-map shape is
deliberately stable so multi-engine drivers don't need translation
layers.
Audit verification only produces meaningful results when the store was
created with :crypto-hash? true:
(st/sync! ds store "main" {:crypto-hash? true})
Without crypto-hashing, dataset and index commit UUIDs are random, so the
"does the recomputed cid match the stored cid?" question is meaningless.
stratum.audit detects this and returns {:status :unsupported :reason :crypto-hash-disabled} rather than a false positive. crypto-hash? is a
per-write decision; you can mix crypto-hashed and random-uuid commits on
the same store, but only the crypto-hashed ones are verifiable.
verify-chainstratum.audit/verify-chain walks the dataset-commit DAG backwards from a
branch HEAD via :parents, recomputes each commit-id from its stored
snapshot, and reports anomalies:
(require '[stratum.audit :as audit])
(audit/verify-chain store)
;; => {:head #uuid "..."
;; :status :ok ;; :ok | :mismatch | :advisory | :incomplete
;; :commits [{:cid ... ;; one entry per visited commit
;; :recomputed ...
;; :parents [...]
;; :status :ok|:mismatch|:advisory}
;; ...]
;; :mismatches [] ;; subset of :commits where :status = :mismatch
;; :missing []} ;; cids referenced as :parents but not loadable
Options:
| Key | Default | Meaning |
|---|---|---|
:branch | "main" | Branch to read HEAD from (ignored when :head is supplied) |
:head | branch HEAD | Explicit commit-id to start the walk from |
:limit | Long/MAX_VALUE | Stop after this many commits — useful for large histories |
:deep? | false | Also walk every column's PSS tree under the head (see below) |
This is the layer-1 check: it catches tampering with the commit metadata
itself (the JSON-ish payload that gen-address hashes to produce the cid).
It does not load the column data — that's :deep? true's job.
A small convenience:
(audit/ok? (audit/verify-chain store))
;; => true / false
:deep? true — PSS tree walkWhen :deep? true, after the chain walk, every column's PSS (persistent
sorted set) tree under the head dataset commit is loaded from konserve and
walked node-by-node. For each node:
stratum.cached-storage/gen-address.{:type :audit/merkle-mismatch :address ... :expected ... :recomputed ...}.The result map gains a :deep entry summarising per-column status:
{:status :ok
:deep {:status :ok
:diffs []} ;; empty when clean; populated with column-keyed errors otherwise
...}
This is the layer-2 check: it catches bytes-level tampering on the
.ksv blobs that the PSS tree actually persists into. :deep? true reads
the full tree, so it's the expensive option — use it on a fresh schedule
(daily / weekly / pre-restore), not on every query.
IAuditableStratumDataset and PersistentColumnIndex both implement the
IAuditable protocol, which gives you a uniform API for verifying a live
value without going through verify-chain:
(audit/-merkle-root ds)
;; => #uuid "..." ;; current commit-id; nil if unsynced
(audit/-recompute-merkle-root ds)
;; => {:status :ok :root #uuid "..."}
;; or
;; {:status :mismatch :root nil :errors [...]}
;; or
;; {:status :unsupported :reason :unsynced | :no-store | :crypto-hash-disabled}
-merkle-root is cheap (returns the cached cid). -recompute-merkle-root
internally calls verify-chain with :deep? true against the live value's
store, so it has the same cost profile as a deep chain walk.
When stratum is embedded as a secondary index inside another storage
substrate, the wrapper type can implement IAuditable against its own
storage so a single multi-engine verification pass covers both layers.
verify-pss-tree-from-cold — column-level entryIf you want to verify a single column without walking the whole dataset:
(audit/verify-pss-tree-from-cold store pss-root-uuid)
;; => {:status :ok}
;; or
;; {:status :mismatch :errors [...]}
pss-root is the UUID found in an index commit snapshot under :pss-root.
The :verified option (an atom holding #{addresses}) shares dedup state
across calls when verifying multiple columns of one dataset, since stratum
often reuses identical PSS subtrees across commits.
:status | Meaning |
|---|---|
:ok | Everything verified clean |
:mismatch | At least one cid or node bytes differs from its recomputed UUID |
:unsupported | Crypto-hashing wasn't enabled (or the value is unsynced) |
:incomplete | A :parents cid couldn't be loaded — chain truncated |
:advisory | Non-fatal issue noted in commit metadata (reserved) |
A :mismatch is the actionable case: a stored blob's bytes no longer hash
back to the address it lives at, which means either the blob was modified
out-of-band or the hash algorithm itself changed (e.g., konserve format
upgrade with insufficient migration). The :errors vector pinpoints which
addresses + which node classes are involved.
doc/storage-and-indices.md — the merkle-mode
background that verify-chain builds on.datahike.index.audit — same
protocol shape, lets you write a single audit driver that covers both
the datahike primary indices and a stratum secondary index in one pass.Can you improve this documentation?Edit on GitHub
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |