Liking cljdoc? Tell your friends :D

Vector API Usage (Java & Clojure Glue)

This document summarizes how the Datalevin Java bindings interact with the native USearch–LMDB integration layer, covering the lifecycle of vector domains, staging mutations, and coordinating multi-process recovery.

Prerequisites

The native dtlv_usearch shared library must be built and available on the library path. Running ./script/build (Linux) or ./script/build-macos produces the requisite artifacts.
Java clients link against the generated DTLV JavaCPP bindings under src/java/datalevin/dtlvnative/DTLV.java.
An LMDB environment must exist (created via DTLV.mdb_env_create, mdb_env_open, etc.) and configured with mdb_env_set_maxdbs large enough to hold the per-domain usearch-* DBIs.

Clojure Integration Points

Loading: Clojure namespaces import the generated JavaCPP class via (import '[datalevin.dtlvnative DTLV]); the platform jar ships JNI binaries under resources/datalevin/dtlvnative/<platform>/ so Loader.load() works without additional setup.
Domain lifecycle: when the Clojure layer opens an LMDB environment for a DB, it should call dtlv_usearch_domain_open once per logical vector domain (e.g., per Datalevin index) and persist the returned pointer alongside the DB handle. Store init options exactly once (metric, scalar kind, dimensions) via dtlv_usearch_store_init_options and fail fast on DTLV_USEARCH_INCOMPATIBLE to trigger migrations.
Query path: before issuing vector searches inside a Datalevin read transaction, call dtlv_usearch_activate to obtain a handle, and refresh it with dtlv_usearch_refresh(handle, readTxn) so the handle matches the LMDB snapshot. If the runtime tracks reader pins, also call dtlv_usearch_pin_handle when a read txn starts and dtlv_usearch_release_pin when it ends; this keeps compaction/checkpoint schedulers aware of active readers.
Write path: Datalevin write transactions should stage mutations using dtlv_usearch_stage_update on the same LMDB write transaction, then call dtlv_usearch_apply_pending before the LMDB commit and dtlv_usearch_publish_log immediately after commit. This preserves LMDB’s single-writer semantics and guarantees WAL promotion/replay across processes.
Maintenance hooks: background tasks in the Clojure layer (e.g., periodic checkpoints or compaction) should invoke dtlv_usearch_checkpoint_write_snapshot/_finalize and dtlv_usearch_compact while honoring DTLV_USEARCH_BUSY and MDB_MAP_FULL responses. On startup, run dtlv_usearch_checkpoint_recover so torn checkpoints or orphaned WAL files are cleaned up before the DB is served.

Creating and Initializing a Vector Domain

DTLV.MDB_env env = new DTLV.MDB_env();
expect(DTLV.mdb_env_create(env) == 0, "Failed to create env");
expect(DTLV.mdb_env_set_maxdbs(env, 64) == 0, "Failed to set max DBs");
expect(DTLV.mdb_env_open(env, envPath, DTLV.MDB_NOLOCK, 0664) == 0,
        "Failed to open env");

DTLV.dtlv_usearch_domain domain = new DTLV.dtlv_usearch_domain();
expect(DTLV.dtlv_usearch_domain_open(env, "vectors", fsRoot, domain) == 0,
        "Failed to open domain");

DTLV.usearch_init_options_t opts = createOpts(dimensions);
DTLV.MDB_txn txn = new DTLV.MDB_txn();
expect(DTLV.mdb_txn_begin(env, null, 0, txn) == 0, "Failed to begin txn");
expect(DTLV.dtlv_usearch_store_init_options(domain, txn, opts) == 0,
        "Failed to store init opts");
expect(DTLV.mdb_txn_commit(txn) == 0, "Failed to commit init opts");

domain binds a logical vector space (vectors/usearch-* DBIs) to a filesystem root used for WAL directories and reader pins.
createOpts should populate metric, scalar kind, dimensions, and other USearch parameters. The helper in Test.java demonstrates typical defaults.

Activating Handles and Running Queries

DTLV.dtlv_usearch_handle handle = new DTLV.dtlv_usearch_handle();
expect(DTLV.dtlv_usearch_activate(domain, handle) == 0, "activate failed");

DTLV.usearch_index_t index = DTLV.dtlv_usearch_handle_index(handle);
PointerPointer<BytePointer> error = new PointerPointer<>(1);
error.put(0, (BytePointer) null);
long found = DTLV.usearch_search(index,
        queryVector,
        DTLV.usearch_scalar_f32_k,
        k,
        keysPointer,
        distancesPointer,
        error);
expectNoError(error, "search failed");

Every thread/process needs to call dtlv_usearch_activate before issuing USearch queries; this streams the latest LMDB snapshot and delta tail into an in-memory handle.
After running queries, call DTLV.dtlv_usearch_deactivate(handle) to release the USearch index when the process shuts down.
Use DTLV.dtlv_usearch_refresh(handle, txn) (with an LMDB read transaction) to catch up to newer log_seq values if a process lags behind.

Staging Vector Updates

DTLV.MDB_txn txn = new DTLV.MDB_txn();
expect(DTLV.mdb_txn_begin(env, null, 0, txn) == 0, "begin write txn failed");

DTLV.dtlv_usearch_update update = new DTLV.dtlv_usearch_update();
update.op(DTLV.DTLV_USEARCH_OP_ADD);
update.key(keyPointer);
update.key_len(Long.BYTES);
update.payload(payloadPointer);
update.payload_len(dimensions * Float.BYTES);
update.scalar_kind((byte) DTLV.usearch_scalar_f32_k);
update.dimensions((short) dimensions);

DTLV.dtlv_usearch_txn_ctx ctx = new DTLV.dtlv_usearch_txn_ctx();
expect(DTLV.dtlv_usearch_stage_update(domain, txn, update, ctx) == 0,
        "stage update failed");
expect(DTLV.dtlv_usearch_apply_pending(ctx) == 0,
        "apply pending failed");
expect(DTLV.mdb_txn_commit(txn) == 0, "LMDB commit failed");
expect(DTLV.dtlv_usearch_publish_log(ctx, 1) == 0,
        "publish log failed");
DTLV.dtlv_usearch_txn_ctx_close(ctx);

stage_update buffers the WAL payload and writes the serialized delta (key + raw vector bytes) into the usearch-delta DBI inside the caller’s transaction.
apply_pending seals the WAL, writes delta rows, and updates metadata before the LMDB commit.
publish_log replays the sealed WAL into all in-process USearch handles. If a process crashes before publish, the WAL stays on disk and will be replayed during the next dtlv_usearch_activate or explicit dtlv_usearch_publish_log.

Checkpoints and Recovery

dtlv_usearch_checkpoint_write_snapshot(domain, index, snapshotSeq, writerUuid, chunkCountOut) serializes the current USearch index into chunked LMDB entries.
dtlv_usearch_checkpoint_finalize(domain, snapshotSeq, pruneLogSeq) atomically updates metadata, prunes deltas ≤ pruneLogSeq, and removes the checkpoint_pending key.
After a crash (or when adopting a stale environment), call dtlv_usearch_checkpoint_recover(domain) before activation. This inspects checkpoint_pending, sealed_log_seq, and on-disk WAL files to ensure any partial work is cleaned up before readers proceed.

Multi-Process Usage

Each JVM process must bring its own LMDB environment handle and open the same domain name. WAL files (<root>/pending/*.ulog*) coordinate cross-process updates; only one LMDB write transaction may run at a time due to LMDB’s single-writer semantics.
The Java harness (testUsearchJavaMultiProcessIntegration) demonstrates how to spawn writer/reader JVMs, including crash-before-publish and checkpoint-crash scenarios. Use ProcessBuilder (or your task runner of choice) to sequence these roles in real deployments.

Implementation Notes

The multi-process helpers live in src/java/datalevin/dtlvnative/Test.java and are intended both as integration tests and as executable examples for Datalevin developers. The MultiProcessWorker class exposes writer, reader, crash-writer, and checkpoint-crash roles to make it easy to script various workflows.
During staging, payloads are copied into both the WAL frame and the LMDB delta row, so crash recovery can always replay deltas even if a host dies after LMDB commit but before the in-memory handles catch up. Once a checkpoint finalizes, the retained delta tail becomes small (only the gap since the latest snapshot).
Checkpoint crash handling relies on metadata entries (checkpoint_pending, per-chunk LMDB keys) to identify incomplete snapshots. Re-running dtlv_usearch_checkpoint_recover is idempotent; it removes torn chunks and clears checkpoint_pending so another process may restart the checkpoint.
USearch is the canonical store of vector bytes. LMDB only keeps them inside the delta log until a checkpoint finalizes, at which point old deltas are pruned. Operators should therefore back up both the LMDB environment and the USearch snapshots/WAL files together.
All file-system interactions (WAL directories and optional reader-pin state) sit under the filesystem_root you pass to dtlv_usearch_domain_open. Ensure the user running Datalevin has permission to read/write these paths, and include them in your backup/restore policies alongside the LMDB data file.

Operational Playbooks

Partial checkpoints: if dtlv_usearch_checkpoint_write_snapshot returns MDB_MAP_FULL, the checkpoint_pending marker remains set so you can increase the LMDB map size (mdb_env_set_mapsize), run dtlv_usearch_checkpoint_recover(domain) to clear the partial snapshot, and then retry the checkpoint/finalize sequence. Use checkpoint_pending as your signal that a previous attempt needs cleanup before scheduling another snapshot.
Pinned readers: the pins LMDB lives under <filesystem_root>/reader-pins/reader-pins.mdb. If long-lived reads stall compaction/checkpoint planning, inspect that file (or call dtlv_usearch_touch_pin/release_pin) to see which (snapshot_seq, log_seq) is still pinned. Expired entries can be removed with dtlv_usearch_release_pin or by deleting the pins env while the process is quiesced; the domain recreates it on open, and readers can refresh handles to advance to the latest snapshot.
Checksum fault injection: set DTLV_FAULT_WAL_CRC=1 to corrupt outgoing WAL frame CRCs or DTLV_FAULT_SNAPSHOT_CRC=1 to write bad snapshot chunk checksums. These toggles help exercise checksum failure paths without manual file mangling.

Can you improve this documentation?Edit on GitHub

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts

`Ctrl`+`k`	Jump to recent docs
`←`	Move to previous article
`→`	Move to next article
`Ctrl`+`/`	Jump to the search field

Raise an issue Browse cljdoc source Chat on Slack

× close