Datahike uses persistent data structures that enable structural sharing—each update creates a new version efficiently by reusing unchanged parts. This allows time-travel queries and git-like versioning, but storage grows over time as old snapshots accumulate.
Garbage collection removes old database snapshots from storage while preserving current branch heads.
Don't confuse garbage collection with data purging:
GC whitelists all current branches and marks snapshots as reachable based on a grace period. Snapshots older than the grace period are deleted from storage, but branch heads are always retained regardless of age.
(require '[datahike.api :as d]
'[superv.async :refer [<?? S]])
;; Remove only deleted branches, keep all snapshots
(<?? S (d/gc-storage conn))
;; => #{...} ; set of deleted storage blobs
Running without a date removes only deleted branches—all snapshots on active branches are preserved. This is safe to run anytime and reclaims storage from old experimental branches.
Note: Returns a core.async channel. Use <?? to block, or run without it for background execution. GC requires no coordination and won't slow down transactions or reads.
Datahike's Distributed Index Space allows readers to access storage directly without coordination. This is powerful for scalability but means long-running processes might read from old snapshots for hours.
Examples of long-running readers:
The grace period ensures these readers don't encounter missing data. Snapshots created after the grace period date are kept; older ones are deleted.
(require '[datahike.api :as d])
;; Keep last 7 days of snapshots
(let [seven-days-ago (java.util.Date. (- (System/currentTimeMillis)
(* 7 24 60 60 1000)))]
(<?? S (d/gc-storage conn seven-days-ago)))
;; Keep last 30 days (common for compliance)
(let [thirty-days-ago (java.util.Date. (- (System/currentTimeMillis)
(* 30 24 60 60 1000)))]
(<?? S (d/gc-storage conn thirty-days-ago)))
;; Keep last 24 hours (for fast-moving data)
(let [yesterday (java.util.Date. (- (System/currentTimeMillis)
(* 24 60 60 1000)))]
(<?? S (d/gc-storage conn yesterday)))
Choosing a grace period:
Branch heads are always kept regardless of the grace period—only intermediate snapshots are removed.
Coming soon: Datahike will support automatic GC with configurable grace periods, eliminating manual maintenance.
GC removes:
GC preserves:
Remember: For deleting specific data (GDPR compliance), use data purging, not garbage collection.
Can you improve this documentation?Edit on GitHub
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |