COW branching semantics on top of Apache Lucene.
Provides fast forking (~3-5ms), structural sharing of immutable segments, branch-isolated indexing/searching, snapshot retention, and explicit GC.
Key concepts:
COW branching semantics on top of Apache Lucene. Provides fast forking (~3-5ms), structural sharing of immutable segments, branch-isolated indexing/searching, snapshot retention, and explicit GC. Key concepts: - Writer: mutable handle to a branch (one per branch per JVM) - Snapshot: immutable DirectoryReader at a specific commit point - Branch: COW overlay sharing base segments with the trunk - GC: explicit cleanup of old snapshots respecting branch references
(add-doc writer doc-map)Add a document to the branch.
doc-map is a map of field-name -> value. Options per field can be specified as {:value v :stored? bool :type :text|:string|:vector}
Simple usage: (add-doc writer {:title "Hello World" :body "Some text"})
Advanced usage: (add-doc writer {:id {:value "doc1" :type :string} :title {:value "Hello" :type :text :stored? true} :embedding {:value (float-array [0.1 0.2 ...]) :type :vector}})
Add a document to the branch.
doc-map is a map of field-name -> value.
Options per field can be specified as {:value v :stored? bool :type :text|:string|:vector}
Simple usage:
(add-doc writer {:title "Hello World" :body "Some text"})
Advanced usage:
(add-doc writer {:id {:value "doc1" :type :string}
:title {:value "Hello" :type :text :stored? true}
:embedding {:value (float-array [0.1 0.2 ...]) :type :vector}})(base-path writer)Returns the base path of the index.
Returns the base path of the index.
(branch-name writer)Returns the branch name.
Returns the branch name.
(close! writer)Close a branch writer and its resources.
Close a branch writer and its resources.
(commit! writer)(commit! writer message)Commit changes on a branch. Stores timestamp in commit user-data.
Optional message is stored for history/log purposes. Returns the commit generation.
Commit changes on a branch. Stores timestamp in commit user-data. Optional message is stored for history/log purposes. Returns the commit generation.
(commit-available? writer generation)Check if a specific commit generation is still available (not GC'd).
Check if a specific commit generation is still available (not GC'd).
(create-index path branch-name)(create-index path branch-name {:keys [analyzer]})Create a new branched index at the given path.
On creation, discovers existing branches and protects their shared segments.
Options: :analyzer - the Lucene Analyzer to use (default: StandardAnalyzer)
Returns a BranchIndexWriter for the given branch.
Create a new branched index at the given path. On creation, discovers existing branches and protects their shared segments. Options: :analyzer - the Lucene Analyzer to use (default: StandardAnalyzer) Returns a BranchIndexWriter for the given branch.
(delete-docs writer field value)Delete documents matching the given term field and value.
Delete documents matching the given term field and value.
(discover-branches path)Discover all branch names at the given path.
Returns a set of branch name strings.
Discover all branch names at the given path. Returns a set of branch name strings.
(flush! writer)Flush pending changes without committing (no durability, but NRT visible).
Flush pending changes without committing (no durability, but NRT visible).
(fork writer new-branch-name)Fork the index into a new branch. Returns the new branch writer.
The new branch shares all existing segments with the parent. Cost: ~3-5ms (flush buffer + copy manifest).
Fork the index into a new branch. Returns the new branch writer. The new branch shares all existing segments with the parent. Cost: ~3-5ms (flush buffer + copy manifest).
(gc! writer before)Garbage collect old commit points and unreferenced segment files.
Only callable on the main branch writer. Scans all branches to determine which files are still needed before removing anything.
before: java.time.Instant — delete commits older than this Returns the number of commit points removed.
Garbage collect old commit points and unreferenced segment files. Only callable on the main branch writer. Scans all branches to determine which files are still needed before removing anything. before: java.time.Instant — delete commits older than this Returns the number of commit points removed.
(list-snapshots writer)List all available snapshots (commit points) for this branch.
Returns a vector of maps with :generation, :snapshot-id, :timestamp, :message, :branch, :segment-count, and :parent-ids.
List all available snapshots (commit points) for this branch. Returns a vector of maps with :generation, :snapshot-id, :timestamp, :message, :branch, :segment-count, and :parent-ids.
(main-branch? writer)Returns true if this is the main (trunk) branch.
Returns true if this is the main (trunk) branch.
(max-doc writer)Returns the total number of documents (including deletions).
Returns the total number of documents (including deletions).
(merge-from! target source)Merge segments from a source branch into this branch.
Uses reader-based addIndexes to avoid lock conflicts with source writer.
Merge segments from a source branch into this branch. Uses reader-based addIndexes to avoid lock conflicts with source writer.
(num-docs writer)Returns the number of documents in this branch (excluding deletions).
Returns the number of documents in this branch (excluding deletions).
(open-branch path branch-name)(open-branch path branch-name {:keys [analyzer]})Open an existing branch writer (for out-of-process branch access).
Opens a BranchedDirectory with the base as read-only and overlay for writes.
Options: :analyzer - the Lucene Analyzer to use (default: StandardAnalyzer)
Open an existing branch writer (for out-of-process branch access). Opens a BranchedDirectory with the base as read-only and overlay for writes. Options: :analyzer - the Lucene Analyzer to use (default: StandardAnalyzer)
(open-reader-at writer generation)Open a reader at a specific commit generation (time-travel).
The caller is responsible for closing the reader. Throws if the generation has been GC'd.
Open a reader at a specific commit generation (time-travel). The caller is responsible for closing the reader. Throws if the generation has been GC'd.
(search writer query)(search writer query {:keys [limit fields] :or {limit 10}})Search a branch. Returns a vector of maps with :doc-id, :score, and field values.
query can be:
Options: :limit - max results (default 10) :fields - fields to retrieve (default: all stored fields)
Search a branch. Returns a vector of maps with :doc-id, :score, and field values.
query can be:
- A Lucene Query object
- A map {:term [field value]} for a term query
- A string (matches all documents containing this term in any field)
Options:
:limit - max results (default 10)
:fields - fields to retrieve (default: all stored fields)(snapshot writer)Take an immutable snapshot (DirectoryReader) of the current branch. The caller is responsible for closing the reader.
Take an immutable snapshot (DirectoryReader) of the current branch. The caller is responsible for closing the reader.
(update-doc writer field value doc-map)Update a document identified by the given term.
Replaces the document matching (field, value) with the new doc-map.
Update a document identified by the given term. Replaces the document matching (field, value) with the new doc-map.
(with-snapshot writer f)Execute f with an immutable snapshot reader. Reader is closed after.
Execute f with an immutable snapshot reader. Reader is closed after.
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |