Complete guide to Proximum's crypto-hash feature for index auditability and integrity verification.
Proximum's crypto-hash feature enables cryptographic auditability for vector indices. When enabled, each sync operation computes a commit hash that:
This is useful for compliance (HIPAA, GDPR), supply chain integrity, and ensuring reproducibility in ML systems.
✅ Audit Trail: Every change tracked with cryptographic commit hash ✅ Tamper Detection: Detect any unauthorized modifications ✅ Backup Verification: Verify backups before restore without loading index ✅ Reproducibility: Exact hash ensures model can be replicated ✅ Supply Chain: Verify data integrity when sharing between systems
import org.replikativ.proximum.*;
import java.util.*;
// Create index with crypto-hash enabled
try (ProximumVectorStore store = ProximumVectorStore.builder()
.dimensions(384)
.storagePath("/tmp/auditable-vectors")
.cryptoHash(true) // Enable auditability
.build()) {
// Add vectors and sync
store.add(embedding1, "doc-1");
store.add(embedding2, "doc-2");
store.sync();
// Get commit hash (unique identifier for this state)
UUID commitHash = store.getCommitHash();
System.out.println("Commit hash: " + commitHash);
// Verify integrity from cold storage
Map<String, Object> storeConfig = Map.of(
"backend", ":file",
"path", "/tmp/auditable-vectors"
);
Map<String, Object> result = ProximumVectorStore.verifyFromCold(storeConfig);
System.out.println("Valid: " + result.get("valid?"));
}
(require '[proximum.core :as core]
'[proximum.crypto :as crypto]
'[proximum.protocols :as p])
;; Create index with crypto-hash enabled
(def idx (core/create-index
{:type :hnsw
:dim 384
:crypto-hash? true ; Enable auditability
:store-config {:backend :file :path "/tmp/auditable-vectors"}
:mmap-dir "/tmp/mmap"}))
;; Add vectors and sync
(def idx (-> idx
(core/insert embedding1 :doc-1)
(core/insert embedding2 :doc-2)
(p/sync!)))
;; Get commit hash
(def commit-hash (crypto/get-commit-hash idx))
(println "Commit hash:" commit-hash)
;; Verify integrity from cold storage
(def result (crypto/verify-from-cold
{:backend :file :path "/tmp/auditable-vectors"}))
(println "Valid:" (:valid? result))
When crypto-hash is enabled, each sync() operation computes a commit hash from three components:
commit-hash = SHA-512(
parent-commit-hash + # Previous commit (git-like chaining)
vectors-hash + # Hash of all vector chunks
edges-hash # Hash of all HNSW edge chunks
)
This creates a Merkle-tree-like structure where:
Same input always produces same hash:
// Two indices with identical data
ProximumVectorStore store1 = builder().cryptoHash(true).build();
ProximumVectorStore store2 = builder().cryptoHash(true).build();
store1.add(vector1, "doc-1");
store1.add(vector2, "doc-2");
store1.sync();
store2.add(vector1, "doc-1");
store2.add(vector2, "doc-2");
store2.sync();
// Commit hashes will be identical
assert store1.getCommitHash().equals(store2.getCommitHash());
Each commit includes the parent hash:
store.add(vector1, "doc-1");
store.sync();
UUID hash1 = store.getCommitHash(); // First commit
store.add(vector2, "doc-2");
store.sync();
UUID hash2 = store.getCommitHash(); // Second commit (includes hash1 in computation)
// Different hashes due to parent chaining
assert !hash1.equals(hash2);
This means:
For HIPAA, GDPR, or SOX compliance, maintain cryptographic proof of all changes:
// Create auditable patient records index
ProximumVectorStore medicalRecords = ProximumVectorStore.builder()
.dimensions(768)
.storagePath("/data/medical-records")
.cryptoHash(true) // Required for audit trail
.build();
// Each update creates auditable commit
medicalRecords.add(patientEmbedding, "patient-123");
medicalRecords.sync();
UUID auditHash = medicalRecords.getCommitHash();
// Log commit hash for compliance
auditLog.record("Updated patient records", auditHash, timestamp);
Verify backup integrity before attempting restore:
// Verify backup without loading entire index
Map<String, Object> storeConfig = Map.of(
"backend", ":file",
"path", "/backups/vectors-2024-01-15"
);
Map<String, Object> verification = ProximumVectorStore.verifyFromCold(storeConfig);
if ((Boolean) verification.get("valid?")) {
// Safe to restore
System.out.println("Backup verified: " + verification.get("commit-id"));
System.out.println("Vectors verified: " + verification.get("vectors-verified"));
System.out.println("Edges verified: " + verification.get("edges-verified"));
} else {
// Backup corrupted - don't restore
System.err.println("Backup corrupted: " + verification.get("error"));
}
Share vector indices with cryptographic proof of integrity:
// Producer: Create index and share commit hash
ProximumVectorStore store = builder().cryptoHash(true).build();
// ... populate index
store.sync();
UUID commitHash = store.getCommitHash();
// Share: Send both the index files AND the commit hash
shareWithPartner(storeFiles, commitHash);
// Consumer: Verify before using
Map<String, Object> verification = ProximumVectorStore.verifyFromCold(storeConfig);
UUID receivedHash = (UUID) verification.get("commit-id");
if (!receivedHash.equals(expectedCommitHash)) {
throw new SecurityException("Index tampered during transmission!");
}
Ensure exact vector store state for model reproducibility:
// Training: Record commit hash with model
model.train(data);
vectorStore.sync();
UUID vectorStoreHash = vectorStore.getCommitHash();
model.saveMetadata("vector_store_hash", vectorStoreHash);
// Inference: Verify correct vector store version
UUID currentHash = vectorStore.getCommitHash();
if (!currentHash.equals(model.getVectorStoreHash())) {
throw new IllegalStateException(
"Vector store version mismatch - results not reproducible!"
);
}
ProximumVectorStore store = ProximumVectorStore.builder()
.cryptoHash(true) // Enable crypto-hash mode
.build();
boolean enabled = store.isCryptoHash();
UUID commitHash = store.getCommitHash();
// Returns null if:
// - crypto-hash is disabled
// - no commits have been made yet (call sync() first)
List<Map<String, Object>> history = store.getHistory();
for (Map<String, Object> commit : history) {
System.out.println("Commit: " + commit.get("proximum/commit-id"));
System.out.println("Date: " + commit.get("proximum/created-at"));
System.out.println("Vectors: " + commit.get("proximum/vector-count"));
}
Map<String, Object> storeConfig = Map.of(
"backend", ":file",
"path", "/path/to/storage"
);
Map<String, Object> result = ProximumVectorStore.verifyFromCold(storeConfig);
// Check result
Boolean valid = (Boolean) result.get("valid?");
Integer vectorsVerified = (Integer) result.get("vectors-verified");
Integer edgesVerified = (Integer) result.get("edges-verified");
UUID commitId = (UUID) result.get("commit-id");
String error = (String) result.get("error"); // If invalid
(def idx (core/create-index
{:crypto-hash? true ; Enable crypto-hash mode
...}))
(crypto/crypto-hash? idx) ; => true/false
(crypto/get-commit-hash idx)
;; Returns UUID or nil if:
;; - crypto-hash is disabled
;; - no commits have been made yet (call sync! first)
(p/history idx)
;; Returns vector of commit maps:
;; [{:proximum/commit-id #uuid "..."
;; :proximum/created-at #inst "..."
;; :proximum/vector-count 100
;; ...}
;; ...]
(crypto/verify-from-cold
{:backend :file :path "/path/to/storage"}
:main ; branch (optional, defaults to :main)
)
;; Returns:
;; {:valid? true
;; :vectors-verified 10
;; :edges-verified 5
;; :commit-id #uuid "..."}
;;
;; Or if invalid:
;; {:valid? false
;; :error :vectors-invalid
;; :vectors-result {...}}
The verifyFromCold() operation enables offline verification of index integrity without loading the entire index into memory.
Verification reads all chunks from storage but:
Typical performance: ~1-2 seconds per GB of index data.
public class BackupVerifier {
public void verifyBackups(String backupDir) {
File[] backups = new File(backupDir).listFiles();
for (File backup : backups) {
System.out.println("Verifying " + backup.getName());
Map<String, Object> config = Map.of(
"backend", ":file",
"path", backup.getAbsolutePath()
);
try {
Map<String, Object> result =
ProximumVectorStore.verifyFromCold(config);
if ((Boolean) result.get("valid?")) {
System.out.println("✓ Valid: " + result.get("commit-id"));
logBackupStatus(backup, "VALID", result);
} else {
System.err.println("✗ Invalid: " + result.get("error"));
logBackupStatus(backup, "CORRUPTED", result);
alertOps("Corrupted backup: " + backup.getName());
}
} catch (Exception e) {
System.err.println("✗ Error: " + e.getMessage());
logBackupStatus(backup, "ERROR", null);
}
}
}
}
Enable crypto-hash for production indices where data integrity matters:
// Development: crypto-hash optional
ProximumVectorStore devStore = builder()
.cryptoHash(false) // Faster for development
.build();
// Production: crypto-hash required
ProximumVectorStore prodStore = builder()
.cryptoHash(true) // ALWAYS enable in production
.build();
Log commit hashes to external audit log for tamper-proof records:
store.sync();
UUID commitHash = store.getCommitHash();
// Log to tamper-proof external system
auditLog.record(new AuditEntry(
timestamp: Instant.now(),
operation: "VECTOR_UPDATE",
commitHash: commitHash,
userId: currentUser.getId()
));
Schedule regular verification of backups:
# Cron job: Verify all backups daily
0 2 * * * /usr/local/bin/verify-vector-backups.sh
// verify-vector-backups.sh calls:
public class BackupVerificationJob {
public void run() {
List<String> backupPaths = listBackups();
for (String path : backupPaths) {
verifyBackup(path);
}
}
}
Include commit hashes in model metadata:
# Training
vector_store.sync()
commit_hash = vector_store.get_commit_hash()
model_metadata = {
"model_version": "1.2.0",
"training_date": "2024-01-15",
"vector_store_commit": str(commit_hash), # Required for reproducibility
"accuracy": 0.95
}
save_model(model, model_metadata)
Crypto-hash adds minimal overhead:
Only disable crypto-hash if:
Problem: getCommitHash() returns null
Causes:
.cryptoHash(true) in buildersync() before getting commit hashSolution:
// Ensure crypto-hash enabled
ProximumVectorStore store = builder()
.cryptoHash(true) // Must be enabled
.build();
// Add vectors
store.add(vector, "id");
// Sync to compute hash
store.sync();
// Now hash is available
UUID hash = store.getCommitHash();
assert hash != null;
Problem: verifyFromCold() returns valid? = false
Causes:
Solution:
Map<String, Object> result = ProximumVectorStore.verifyFromCold(config);
if (!(Boolean) result.get("valid?")) {
String error = (String) result.get("error");
switch (error) {
case "branch-not-found":
// Wrong branch name
System.err.println("Branch not found: " + result.get("branch"));
break;
case "vectors-invalid":
// Vector chunks corrupted
System.err.println("Vector corruption detected");
restoreFromBackup();
break;
case "edges-invalid":
// Edge chunks corrupted
System.err.println("HNSW graph corruption detected");
restoreFromBackup();
break;
}
}
Problem: Two indices with identical data have different commit hashes
Cause: Insert order or timing differences
Explanation: Commit hashes include parent hash (git-like chaining), so:
This is expected behavior - commit hashes represent the entire history, not just current state.
Problem: Sync is slower with crypto-hash enabled
Expected behavior: Crypto-hash adds ~5-10ms overhead per sync for hash computation.
If sync is much slower:
Mitigation:
// Batch updates to reduce sync frequency
for (int i = 0; i < 1000; i++) {
store.add(vectors[i], ids[i]);
}
// Single sync instead of 1000
store.sync(); // Only pays crypto-hash cost once
See examples/java/AuditableIndex.java for a complete working example.
Can you improve this documentation?Edit on GitHub
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |