dev.kwill.datomic-backup.current-state-restore

Liking cljdoc? Tell your friends :D

Clojure only.

Current state restore for Datomic databases.

Restores a database by copying schema and current datom state (no history) from a source database to a destination database.

Restore Strategies

Single-Pass (default, `:two-pass? false`)

Processes all attributes together, managing ref dependencies with a pending index. Best for databases with few ref attributes or simple dependency graphs.

Two-Pass (`:two-pass? true`)

Processes non-ref attributes first, then ref attributes second. Benefits:

Eliminates most pending overhead (pass 1 has zero pending)
Near 100% batch efficiency in pass 1
Minimal pending in pass 2 (only circular refs)
Expected speedup: 1.5-2.5x for databases with many ref attributes
Pass 1 supports parallel transactions (via :tx-parallelism)

Performance Tuning

The restore process has these main performance knobs:

:max-batch-size (default 500) - Datoms per transaction Higher = fewer transactions, faster overall
:read-parallelism (default 20) - Parallel attribute reads Higher = faster reads
:read-chunk (default 5000) - Datoms per read chunk Higher = fewer read calls
:tx-parallelism (default 4) - Parallel transaction workers for Pass 1 Only used when :two-pass? true Higher = more concurrent transactions (limited by Datomic transactor throughput)

Performance Metrics

When :debug true, logs every 10 batches with these metrics:

:pending-count - Datoms waiting due to ref dependencies High count → increase :max-batch-size or try :two-pass? true
:last-tx-ms / :avg-tx-ms - Transaction timing High values → Datomic writes are slow, may be at limit
:batch-efficiency - Percentage of batch actually written Low (< 80%) → increase :max-batch-size to resolve more refs
:utilization-pct - Channel buffer utilization (every 10s) High (> 80%) → writes are bottleneck (good!) Low (< 20%) → reads are slow, increase :read-parallelism

At completion:

:avg-tx-ms - Overall average transaction time
:final-pending-count - Should be 0
:duration-sec - Total time to read all datoms

Current state restore for Datomic databases.

Restores a database by copying schema and current datom state (no history)
from a source database to a destination database.

## Restore Strategies

### Single-Pass (default, `:two-pass? false`)
Processes all attributes together, managing ref dependencies with a pending index.
Best for databases with few ref attributes or simple dependency graphs.

### Two-Pass (`:two-pass? true`)
Processes non-ref attributes first, then ref attributes second.
Benefits:
- Eliminates most pending overhead (pass 1 has zero pending)
- Near 100% batch efficiency in pass 1
- Minimal pending in pass 2 (only circular refs)
- Expected speedup: 1.5-2.5x for databases with many ref attributes
- Pass 1 supports parallel transactions (via `:tx-parallelism`)

## Performance Tuning

The restore process has these main performance knobs:

- `:max-batch-size` (default 500) - Datoms per transaction
  Higher = fewer transactions, faster overall

- `:read-parallelism` (default 20) - Parallel attribute reads
  Higher = faster reads

- `:read-chunk` (default 5000) - Datoms per read chunk
  Higher = fewer read calls

- `:tx-parallelism` (default 4) - Parallel transaction workers for Pass 1
  Only used when `:two-pass? true`
  Higher = more concurrent transactions (limited by Datomic transactor throughput)

## Performance Metrics

When `:debug true`, logs every 10 batches with these metrics:

- `:pending-count` - Datoms waiting due to ref dependencies
  High count → increase :max-batch-size or try :two-pass? true

- `:last-tx-ms` / `:avg-tx-ms` - Transaction timing
  High values → Datomic writes are slow, may be at limit

- `:batch-efficiency` - Percentage of batch actually written
  Low (< 80%) → increase :max-batch-size to resolve more refs

- `:utilization-pct` - Channel buffer utilization (every 10s)
  High (> 80%) → writes are bottleneck (good!)
  Low (< 20%) → reads are slow, increase :read-parallelism

At completion:
- `:avg-tx-ms` - Overall average transaction time
- `:final-pending-count` - Should be 0
- `:duration-sec` - Total time to read all datoms

raw docstring

-full-copy^clj

(-full-copy {:keys [source-db schema-lookup dest-conn max-batch-size debug
                    read-parallelism read-chunk init-state attrs]})

source

<anom!!^clj

(<anom!! ch)

source

add-tuple-attrs!^clj

(add-tuple-attrs! {:keys [dest-conn tuple-schema]})

Adds tuple attributes to schema and establishes their composite values.

Adds tuple attributes to schema and establishes their composite values.

source raw docstring

anom!^clj

(anom! x)

source

batch-datoms!^clj

(batch-datoms! datom-ch work-ch max-batch-size eid->schema max-bootstrap-tx)

Batches datoms from input channel into max-batch-size batches. Filters out schema and bootstrap datoms. Closes work-ch when input channel closes.

Batches datoms from input channel into max-batch-size batches.
Filters out schema and bootstrap datoms.
Closes work-ch when input channel closes.

source raw docstring

batch-datoms-partitioned!^clj

(batch-datoms-partitioned! datom-ch
                           worker-channels
                           max-batch-size
                           eid->schema
                           max-bootstrap-tx
                           debug)

Batches datoms from input channel and routes to worker channels based on entity ID. Uses consistent hashing to ensure all datoms for the same entity go to the same worker. This prevents entity splitting when max-batch-size is smaller than an entity's datom count. Filters out schema and bootstrap datoms. Closes all worker channels when input channel closes.

Batches datoms from input channel and routes to worker channels based on entity ID.
Uses consistent hashing to ensure all datoms for the same entity go to the same worker.
This prevents entity splitting when max-batch-size is smaller than an entity's datom count.
Filters out schema and bootstrap datoms.
Closes all worker channels when input channel closes.

source raw docstring

collect-results!^clj

(collect-results! result-ch debug init-state)

Collects results from workers and merges them into final state. Logs progress every 5 transactions when debug is enabled. Returns when result-ch is closed.

Collects results from workers and merges them into final state.
Logs progress every 5 transactions when debug is enabled.
Returns when result-ch is closed.

source raw docstring

copy-schema!^clj

(copy-schema! {:keys [dest-conn schema-lookup attrs]})

source

establish-composite-tuple!^clj

(establish-composite-tuple! conn {:keys [attr batch-size]})

Reasserts all values of attr, in batches of batch-size. This will establish values for any composite attributes built from attr.

Reasserts all values of attr, in batches of batch-size.
This will establish values for any composite attributes built from attr.

source raw docstring

monitored-chan!^clj

(monitored-chan! ch
                 {:keys [runningf channel-name every-ms] :or {every-ms 10000}})

source

partition-attributes-by-ref^clj

(partition-attributes-by-ref ident->schema)

Partitions attribute eids into :non-ref and :ref based on their value types.

Non-ref attributes include:

All non-ref, non-tuple attributes
Tuples that don't contain any ref types

Ref attributes include:

Direct :db.type/ref attributes
Tuples containing at least one ref type (composite, heterogeneous, or homogeneous)

Partitions attribute eids into :non-ref and :ref based on their value types.

Non-ref attributes include:
- All non-ref, non-tuple attributes
- Tuples that don't contain any ref types

Ref attributes include:
- Direct :db.type/ref attributes
- Tuples containing at least one ref type (composite, heterogeneous, or homogeneous)

source raw docstring

partition-schema-by-deps^clj

(partition-schema-by-deps source-schema)

source

pass1-parallel-copy^clj

(pass1-parallel-copy {:keys [source-db dest-conn schema-lookup max-batch-size
                             debug read-parallelism read-chunk tx-parallelism
                             init-state attrs]})

Parallel transaction pipeline for Pass 1 (non-ref datoms).

Since Pass 1 datoms have no ref dependencies, we can transact batches in parallel without coordination between transactions.

Uses entity-partitioned batching to ensure all datoms for the same entity are routed to the same worker, preventing duplicate entity creation when max-batch-size splits an entity's datoms across multiple batches.

Pipeline: [Datom Reader] → [Partitioned Batcher] → [Worker 0 Ch] → [Worker 0] → [Worker 1 Ch] → [Worker 1] → [Worker N Ch] → [Worker N] → [Results Collector]

Parallel transaction pipeline for Pass 1 (non-ref datoms).

Since Pass 1 datoms have no ref dependencies, we can transact
batches in parallel without coordination between transactions.

Uses entity-partitioned batching to ensure all datoms for the same
entity are routed to the same worker, preventing duplicate entity creation
when max-batch-size splits an entity's datoms across multiple batches.

Pipeline:
[Datom Reader] → [Partitioned Batcher] → [Worker 0 Ch] → [Worker 0]
                                        → [Worker 1 Ch] → [Worker 1]
                                        → [Worker N Ch] → [Worker N]
                                        → [Results Collector]

source raw docstring

process-single-batch^clj

(process-single-batch
  {:keys [old-id->new-id pending-index dest-conn eid->schema debug] :as acc}
  batch)

Process a single batch of datoms, updating the accumulator state.

Process a single batch of datoms, updating the accumulator state.

source raw docstring

read-datoms-in-parallel^clj

(read-datoms-in-parallel source-db
                         {:keys [attrs dest-ch parallelism read-chunk debug]})

source

read-datoms-to-chan!^clj

(read-datoms-to-chan! db argm dest-ch)

source

resolve-datom^clj

(resolve-datom datom eid->schema old-id->new-id)

source

restore^clj

(restore {:keys [source-db dest-conn tx-parallelism] :as argm})

Restores a database by copying schema and current datom state.

Options:

:tx-parallelism - Number of parallel transaction workers for Pass 1 in two-pass mode.

Restores a database by copying schema and current datom state.

Options:
- :tx-parallelism - Number of parallel transaction workers for Pass 1 in two-pass mode.

source raw docstring

restore-two-pass^clj

(restore-two-pass {:keys [attrs schema-lookup tx-parallelism init-state]
                   :as argm})

Two-pass restore: non-ref datoms first, then ref datoms.

This approach eliminates most pending overhead by ensuring all entities exist before processing ref attributes.

Pass 1: Process all non-ref attributes

Zero pending overhead (no ref dependencies)
Can use parallel transactions

Pass 2: Process all ref attributes

Most refs resolve immediately
Always sequential (due to potential circular refs)

Two-pass restore: non-ref datoms first, then ref datoms.

This approach eliminates most pending overhead by ensuring all entities
exist before processing ref attributes.

Pass 1: Process all non-ref attributes
- Zero pending overhead (no ref dependencies)
- Can use parallel transactions

Pass 2: Process all ref attributes
- Most refs resolve immediately
- Always sequential (due to potential circular refs)

source raw docstring

tempids->old-id-mapping^clj

(tempids->old-id-mapping tempids allowed-ks)

Extracts old-id->new-id mapping from transaction result's :tempids. Tempid keys are strings representing old entity IDs.

Extracts old-id->new-id mapping from transaction result's :tempids.
Tempid keys are strings representing old entity IDs.

source raw docstring

thread-factory^clj

(thread-factory name-prefix)

Creates a ThreadFactory that names threads with the given prefix and an incrementing counter.

Creates a ThreadFactory that names threads with the given prefix and an incrementing counter.

source raw docstring

tx-worker!^clj

(tx-worker! work-ch result-ch dest-conn eid->schema debug worker-id)

Worker that processes batches from work-ch and sends results to result-ch. For Pass 1 non-ref datoms only - no pending logic needed. Maintains old-id->new-id state across batches to handle entities split across batches.

Worker that processes batches from work-ch and sends results to result-ch.
For Pass 1 non-ref datoms only - no pending logic needed.
Maintains old-id->new-id state across batches to handle entities split across batches.

source raw docstring

txify-datoms^clj

(txify-datoms datoms pending-index eid->schema old-id->new-id)

source

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts

`Ctrl`+`k`	Jump to recent docs
`←`	Move to previous article
`→`	Move to next article
`Ctrl`+`/`	Jump to the search field

Raise an issue Browse cljdoc source Chat on Slack

× close