Copy-on-write branching for Apache Lucene. Git-like snapshot and branch semantics on full-text search indices with structural sharing.
Built on Lucene 10.3.2. Forking a branch takes 3-5ms regardless of index size by sharing immutable segment files.
| Layer | Namespace | Use Case |
|---|---|---|
| Java | org.replikativ.scriptum.BranchIndexWriter | Direct Java usage |
| Core | scriptum.core | Low-level Clojure wrapper |
| Yggdrasil | scriptum.yggdrasil | High-level protocols |
For Clojure users: scriptum.yggdrasil for high-level API, scriptum.core for lower-level control.
For Java users: use BranchIndexWriter directly.
For Maven/Gradle:
<dependency>
<groupId>org.replikativ</groupId>
<artifactId>scriptum</artifactId>
<version>0.1.1</version>
</dependency>
Java sources must be compiled before use:
clj -T:build compile-java
(require '[scriptum.core :as sc])
;; Create an index
(def writer (sc/create-index "/tmp/my-index"))
;; Add documents
(sc/add-doc writer {:title {:type :text :value "Hello World"}
:id {:type :string :value "doc-1"}})
(sc/commit! writer "Initial commit")
;; Search
(sc/search writer {:match-all {}} 10)
;; => [{:title "Hello World", :id "doc-1", :score 1.0}]
;; Fork a branch
(def feature (sc/fork writer "experiment"))
;; Add to branch (doesn't affect main)
(sc/add-doc feature {:title {:type :text :value "Branch only"}
:id {:type :string :value "doc-2"}})
(sc/commit! feature "Added experimental doc")
;; Main still has 1 doc, branch has 2
(count (sc/search writer {:match-all {}} 100)) ;; => 1
(count (sc/search feature {:match-all {}} 100)) ;; => 2
;; Merge branch back
(sc/merge-from! writer feature)
(sc/commit! writer "Merged experiment")
;; Cleanup
(sc/close! feature)
(sc/close! writer)
(sc/create-index path) ; create new index at path
(sc/open-branch path branch-name) ; open existing branch
(sc/fork writer "branch-name") ; fast fork from writer
(sc/close! writer) ; close writer and release resources
(sc/discover-branches path) ; => ["main" "feature" ...]
;; Accessors
(sc/num-docs writer) ; document count (excluding deletions)
(sc/max-doc writer) ; document count (including deletions)
(sc/branch-name writer) ; current branch name
(sc/base-path writer) ; index base path
(sc/main-branch? writer) ; true if this is the main branch
Field types: :text (analyzed, searchable), :string (exact match), :vector (float array for KNN).
(sc/add-doc writer {:title {:type :text :value "Searchable text"}
:tag {:type :string :value "exact-match"}
:embed {:type :vector :value (float-array [0.1 0.2 0.3])
:dims 3}})
(sc/delete-docs writer :id "doc-1") ; delete by field+value
(sc/update-doc writer :id "doc-1" new-fields) ; atomic delete+add
(sc/commit! writer "commit message") ; persist changes
(sc/flush! writer) ; flush without new commit point
(sc/merge-from! writer source-writer) ; merge segments from another branch
(sc/list-snapshots writer)
;; => [{:generation 1 :uuid "..." :timestamp "..." :message "..." :branch "main"}
;; {:generation 2 :uuid "..." :timestamp "..." :message "..." :branch "main"}]
;; Term query
(sc/search writer {:term {:field "tag" :value "exact-match"}} 10)
;; Match-all
(sc/search writer {:match-all {}} 100)
;; Custom Lucene query object
(sc/search writer my-lucene-query 10)
;; Returns: [{:field1 "val" :field2 "val" :score 1.0} ...]
;; Get snapshot at specific generation
(def reader (sc/open-reader-at writer 1))
;; Check if a generation still exists (may be GC'd)
(sc/commit-available? writer 1) ; => true/false
;; Get current immutable snapshot
(def snap (sc/snapshot writer))
;; Execute with auto-closing snapshot
(sc/with-snapshot [reader writer]
(sc/search reader {:match-all {}} 10))
(.close reader)
;; Remove commits older than 1 hour, respecting branch references
(sc/gc! writer)
GC only runs on the main branch and protects all segment files referenced by any branch.
For Java users, BranchIndexWriter provides the complete API:
import org.replikativ.scriptum.BranchIndexWriter;
import org.apache.lucene.document.*;
import java.nio.file.Path;
import java.time.Duration;
import java.time.Instant;
// Create an index
BranchIndexWriter main = BranchIndexWriter.create(Path.of("/tmp/my-index"), "main");
// Add documents
Document doc = new Document();
doc.add(new TextField("title", "Hello World", Field.Store.YES));
doc.add(new StringField("id", "doc-1", Field.Store.YES));
main.addDocument(doc);
main.commit("Initial commit");
// Fork a branch (3-5ms regardless of index size)
BranchIndexWriter feature = main.fork("experiment");
feature.addDocument(anotherDoc);
feature.commit("Feature work");
// Search
DirectoryReader reader = main.openReader();
IndexSearcher searcher = new IndexSearcher(reader);
TopDocs results = searcher.search(new MatchAllDocsQuery(), 10);
reader.close();
// Merge branch back
main.mergeFrom(feature);
// Time travel - open reader at specific generation
DirectoryReader historical = main.openReaderAt(1);
// Garbage collect old commits
main.gc(Instant.now().minus(Duration.ofHours(1)));
// Discover branches
Set<String> branches = BranchIndexWriter.discoverBranches(Path.of("/tmp/my-index"));
// Cleanup
feature.close();
main.close();
| Method | Description |
|---|---|
create(path, branchName) | Create new index |
open(path, branchName) | Open existing branch |
fork(branchName) | Fast fork (copies metadata only) |
addDocument(doc) | Add a document |
deleteDocuments(terms...) | Delete by terms |
updateDocument(term, doc) | Atomic delete+add |
commit() / commit(message) | Persist changes |
openReader() | NRT reader (sees uncommitted) |
openCommittedReader() | Reader on committed state |
openReaderAt(generation) | Time travel to specific commit |
isCommitAvailable(generation) | Check if commit still exists |
listSnapshots() | Get all commit points |
mergeFrom(source) | Merge another branch |
gc(beforeInstant) | Garbage collect old commits |
numDocs() / maxDoc() | Document counts |
getBranchName() | Current branch name |
isMainBranch() | Check if main branch |
Scriptum implements the Yggdrasil protocol stack (Snapshotable, Branchable, Graphable, Mergeable):
(require '[scriptum.yggdrasil :as sy]
'[yggdrasil.protocols :as p])
(def sys (sy/create "/tmp/my-index" {:system-name "search-index"}))
(p/branches sys) ; => #{:main}
(p/branch! sys :feature)
(p/checkout sys :feature)
;; ... add docs, commit ...
(p/merge! sys :main)
(p/history sys {:limit 10})
(sy/close! sys)
Passes the full yggdrasil compliance test suite (22 tests, 203 assertions).
Typical results:
On disk, scriptum uses this structure:
basePath/ -- trunk (main branch)
_0.cfs, _1.cfs, ... -- shared segment files
segments_N -- main's commit points
branches/
feature/ -- branch overlay
_10000.cfs, ... -- branch-specific segments
segments_N -- branch's commit points
Branches share base segments via read-only references. Only new writes create branch-specific segment files.
See docs/LUCENE_EXTENSION.md for a deep-dive into how Scriptum extends Lucene:
src/
clojure/scriptum/
core.clj # Low-level COW branching API
yggdrasil.clj # Yggdrasil protocol adapter
java/org/replikativ/scriptum/
BranchIndexWriter.java # Branch-aware Lucene writer (main Java API)
BranchedDirectory.java # COW directory overlay
BranchAwareMergePolicy.java # Prevents merging shared segments
BranchDeletionPolicy.java # Retains all commits until GC
docs/
LUCENE_EXTENSION.md # Technical deep-dive
test/scriptum/
core_test.clj # Unit tests
yggdrasil_test.clj # Compliance tests
# Compile Java sources
clj -T:build compile-java
# Run tests
clj -T:build compile-java && clj -M:test
# Start nREPL
clj -T:build compile-java && clj -M:repl
Copyright (c) 2026 Christian Weilbach
Licensed under the Eclipse Public License 2.0.
Can you improve this documentation?Edit on GitHub
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |