This document summarizes research into zero-copy serialization options that could potentially improve deserialization performance in konserve, particularly for the tiered store sync use case.
When using multi-get to retrieve many keys at once (e.g., during tiered store initialization from IndexedDB), the deserialization loop becomes the bottleneck. Even with efficient bulk I/O via single IndexedDB transactions, the CPU-bound deserialization of each blob limits throughput.
Current deserialization path (defaults.cljc):
The deserialization step requires parsing and allocating every nested data structure, which doesn't parallelize well in JavaScript's single-threaded environment.
Zero-copy serialization formats allow direct access to serialized data without a deserialization step. The data is laid out in memory in a way that allows pointer arithmetic to access fields directly from the serialized bytes.
Benefits:
Drawbacks:
Status: Mature, cross-platform, good option for CLJ/CLJS
Compatibility with Konserve:
For Datahike:
:db/valueType:db.type/string, :db.type/long, :db.type/double, :db.type/boolean, :db.type/instant, :db.type/uuid, :db.type/ref, :db.type/keywordImplementation Effort: Medium
Status: Mature but less ecosystem support than FlatBuffers
Compatibility:
Implementation Effort: Medium-High
Status: Fastest zero-copy option, but Rust-only
Compatibility:
Implementation Effort: High (if possible at all)
Status: Interesting research, C-only
Compatibility:
Implementation Effort: Very High
Status: Mature but NOT zero-copy
| Feature | FlatBuffers | Cap'n Proto | rkyv | Lite³ |
|---|---|---|---|---|
| JVM Support | Yes (official) | Yes (community) | No | Via JNI |
| JS Support | Yes (official) | Limited | No | Via WASM |
| Zero-copy read | Yes | Yes | Yes | Yes |
| Schema required | Yes | Yes | Yes | No |
| Write performance | Good | Good | Excellent | Good |
| Read performance | Excellent | Excellent | Best | Excellent |
| Ecosystem/Maturity | High | Medium | Medium | Low |
| Implementation effort | Medium | Medium-High | N/A | Very High |
FlatBuffers is the most practical option for CLJ/CLJS:
Define schemas for index structures:
table Datom {
e: long;
a: int; // attribute id
v: DatomValue; // union type
tx: long;
added: bool;
}
union DatomValue {
StringValue,
LongValue,
DoubleValue,
// ... other db types
}
table Leaf {
datoms: [Datom];
}
table Branch {
keys: [Datom];
children: [ulong]; // store-keys as references
}
Create hybrid serializer:
Integrate with konserve:
PStoreSerializerTo fully benefit from zero-copy:
:db/valueType setIf zero-copy is too complex, other optimizations:
Parallel deserialization (JVM only):
pmap or thread pool for concurrent deserializationLazy deserialization:
Better Fressian tuning:
WebAssembly for CLJS:
Zero-copy serialization via FlatBuffers is feasible for konserve/datahike but requires:
The payoff would be near-instant "deserialization" for index loads, potentially improving tiered store sync by 10-100x for the deserialization portion. However, given the implementation complexity, this should only be pursued if profiling confirms deserialization is a critical bottleneck in production use cases.
For now, the multi-get implementation provides significant I/O-level improvements. Zero-copy can be revisited when/if deserialization becomes the dominant bottleneck at scale.
Document created: December 2024 Context: multi-get implementation for tiered store sync optimization
Can you improve this documentation?Edit on GitHub
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |