Current state restore for Datomic databases.
Restores a database by copying schema and current datom state (no history) from a source database to a destination database.
:two-pass? false
)Processes all attributes together, managing ref dependencies with a pending index. Best for databases with few ref attributes or simple dependency graphs.
:two-pass? true
)Processes non-ref attributes first, then ref attributes second. Benefits:
:tx-parallelism
)The restore process has these main performance knobs:
:max-batch-size
(default 500) - Datoms per transaction
Higher = fewer transactions, faster overall
:read-parallelism
(default 20) - Parallel attribute reads
Higher = faster reads
:read-chunk
(default 5000) - Datoms per read chunk
Higher = fewer read calls
:tx-parallelism
(default 4) - Parallel transaction workers for Pass 1
Only used when :two-pass? true
Higher = more concurrent transactions (limited by Datomic transactor throughput)
When :debug true
, logs every 10 batches with these metrics:
:pending-count
- Datoms waiting due to ref dependencies
High count → increase :max-batch-size or try :two-pass? true
:last-tx-ms
/ :avg-tx-ms
- Transaction timing
High values → Datomic writes are slow, may be at limit
:batch-efficiency
- Percentage of batch actually written
Low (< 80%) → increase :max-batch-size to resolve more refs
:utilization-pct
- Channel buffer utilization (every 10s)
High (> 80%) → writes are bottleneck (good!)
Low (< 20%) → reads are slow, increase :read-parallelism
At completion:
:avg-tx-ms
- Overall average transaction time:final-pending-count
- Should be 0:duration-sec
- Total time to read all datomsCurrent state restore for Datomic databases. Restores a database by copying schema and current datom state (no history) from a source database to a destination database. ## Restore Strategies ### Single-Pass (default, `:two-pass? false`) Processes all attributes together, managing ref dependencies with a pending index. Best for databases with few ref attributes or simple dependency graphs. ### Two-Pass (`:two-pass? true`) Processes non-ref attributes first, then ref attributes second. Benefits: - Eliminates most pending overhead (pass 1 has zero pending) - Near 100% batch efficiency in pass 1 - Minimal pending in pass 2 (only circular refs) - Expected speedup: 1.5-2.5x for databases with many ref attributes - Pass 1 supports parallel transactions (via `:tx-parallelism`) ## Performance Tuning The restore process has these main performance knobs: - `:max-batch-size` (default 500) - Datoms per transaction Higher = fewer transactions, faster overall - `:read-parallelism` (default 20) - Parallel attribute reads Higher = faster reads - `:read-chunk` (default 5000) - Datoms per read chunk Higher = fewer read calls - `:tx-parallelism` (default 4) - Parallel transaction workers for Pass 1 Only used when `:two-pass? true` Higher = more concurrent transactions (limited by Datomic transactor throughput) ## Performance Metrics When `:debug true`, logs every 10 batches with these metrics: - `:pending-count` - Datoms waiting due to ref dependencies High count → increase :max-batch-size or try :two-pass? true - `:last-tx-ms` / `:avg-tx-ms` - Transaction timing High values → Datomic writes are slow, may be at limit - `:batch-efficiency` - Percentage of batch actually written Low (< 80%) → increase :max-batch-size to resolve more refs - `:utilization-pct` - Channel buffer utilization (every 10s) High (> 80%) → writes are bottleneck (good!) Low (< 20%) → reads are slow, increase :read-parallelism At completion: - `:avg-tx-ms` - Overall average transaction time - `:final-pending-count` - Should be 0 - `:duration-sec` - Total time to read all datoms
(-full-copy {:keys [source-db schema-lookup dest-conn max-batch-size debug
read-parallelism read-chunk init-state attrs]})
(add-tuple-attrs! {:keys [dest-conn tuple-schema]})
Adds tuple attributes to schema and establishes their composite values.
Adds tuple attributes to schema and establishes their composite values.
(batch-datoms! datom-ch work-ch max-batch-size eid->schema max-bootstrap-tx)
Batches datoms from input channel into max-batch-size batches. Filters out schema and bootstrap datoms. Closes work-ch when input channel closes.
Batches datoms from input channel into max-batch-size batches. Filters out schema and bootstrap datoms. Closes work-ch when input channel closes.
(batch-datoms-partitioned! datom-ch
worker-channels
max-batch-size
eid->schema
max-bootstrap-tx
debug)
Batches datoms from input channel and routes to worker channels based on entity ID. Uses consistent hashing to ensure all datoms for the same entity go to the same worker. This prevents entity splitting when max-batch-size is smaller than an entity's datom count. Filters out schema and bootstrap datoms. Closes all worker channels when input channel closes.
Batches datoms from input channel and routes to worker channels based on entity ID. Uses consistent hashing to ensure all datoms for the same entity go to the same worker. This prevents entity splitting when max-batch-size is smaller than an entity's datom count. Filters out schema and bootstrap datoms. Closes all worker channels when input channel closes.
(collect-results! result-ch debug init-state)
Collects results from workers and merges them into final state. Logs progress every 5 transactions when debug is enabled. Returns when result-ch is closed.
Collects results from workers and merges them into final state. Logs progress every 5 transactions when debug is enabled. Returns when result-ch is closed.
(establish-composite-tuple! conn {:keys [attr batch-size]})
Reasserts all values of attr, in batches of batch-size. This will establish values for any composite attributes built from attr.
Reasserts all values of attr, in batches of batch-size. This will establish values for any composite attributes built from attr.
(monitored-chan! ch
{:keys [runningf channel-name every-ms] :or {every-ms 10000}})
(partition-attributes-by-ref ident->schema)
Partitions attribute eids into :non-ref and :ref based on their value types.
Non-ref attributes include:
Ref attributes include:
Partitions attribute eids into :non-ref and :ref based on their value types. Non-ref attributes include: - All non-ref, non-tuple attributes - Tuples that don't contain any ref types Ref attributes include: - Direct :db.type/ref attributes - Tuples containing at least one ref type (composite, heterogeneous, or homogeneous)
(pass1-parallel-copy {:keys [source-db dest-conn schema-lookup max-batch-size
debug read-parallelism read-chunk tx-parallelism
init-state attrs]})
Parallel transaction pipeline for Pass 1 (non-ref datoms).
Since Pass 1 datoms have no ref dependencies, we can transact batches in parallel without coordination between transactions.
Uses entity-partitioned batching to ensure all datoms for the same entity are routed to the same worker, preventing duplicate entity creation when max-batch-size splits an entity's datoms across multiple batches.
Pipeline: [Datom Reader] → [Partitioned Batcher] → [Worker 0 Ch] → [Worker 0] → [Worker 1 Ch] → [Worker 1] → [Worker N Ch] → [Worker N] → [Results Collector]
Parallel transaction pipeline for Pass 1 (non-ref datoms). Since Pass 1 datoms have no ref dependencies, we can transact batches in parallel without coordination between transactions. Uses entity-partitioned batching to ensure all datoms for the same entity are routed to the same worker, preventing duplicate entity creation when max-batch-size splits an entity's datoms across multiple batches. Pipeline: [Datom Reader] → [Partitioned Batcher] → [Worker 0 Ch] → [Worker 0] → [Worker 1 Ch] → [Worker 1] → [Worker N Ch] → [Worker N] → [Results Collector]
(process-single-batch
{:keys [old-id->new-id pending-index dest-conn eid->schema debug] :as acc}
batch)
Process a single batch of datoms, updating the accumulator state.
Process a single batch of datoms, updating the accumulator state.
(read-datoms-in-parallel source-db
{:keys [attrs dest-ch parallelism read-chunk debug]})
(restore {:keys [source-db dest-conn tx-parallelism] :as argm})
Restores a database by copying schema and current datom state.
Options:
Restores a database by copying schema and current datom state. Options: - :tx-parallelism - Number of parallel transaction workers for Pass 1 in two-pass mode.
(restore-two-pass {:keys [attrs schema-lookup tx-parallelism init-state]
:as argm})
Two-pass restore: non-ref datoms first, then ref datoms.
This approach eliminates most pending overhead by ensuring all entities exist before processing ref attributes.
Pass 1: Process all non-ref attributes
Pass 2: Process all ref attributes
Two-pass restore: non-ref datoms first, then ref datoms. This approach eliminates most pending overhead by ensuring all entities exist before processing ref attributes. Pass 1: Process all non-ref attributes - Zero pending overhead (no ref dependencies) - Can use parallel transactions Pass 2: Process all ref attributes - Most refs resolve immediately - Always sequential (due to potential circular refs)
(tempids->old-id-mapping tempids allowed-ks)
Extracts old-id->new-id mapping from transaction result's :tempids. Tempid keys are strings representing old entity IDs.
Extracts old-id->new-id mapping from transaction result's :tempids. Tempid keys are strings representing old entity IDs.
(thread-factory name-prefix)
Creates a ThreadFactory that names threads with the given prefix and an incrementing counter.
Creates a ThreadFactory that names threads with the given prefix and an incrementing counter.
(tx-worker! work-ch result-ch dest-conn eid->schema debug worker-id)
Worker that processes batches from work-ch and sends results to result-ch. For Pass 1 non-ref datoms only - no pending logic needed. Maintains old-id->new-id state across batches to handle entities split across batches.
Worker that processes batches from work-ch and sends results to result-ch. For Pass 1 non-ref datoms only - no pending logic needed. Maintains old-id->new-id state across batches to handle entities split across batches.
cljdoc builds & hosts documentation for Clojure/Script libraries
Ctrl+k | Jump to recent docs |
← | Move to previous article |
→ | Move to next article |
Ctrl+/ | Jump to the search field |