Query performance (speed)
Less code (option to make more layered)
Simplified storage
Possibility of sharing index data-structures between nodes
For the latter, we have the option of moving away from the K/V stores and to have our own memory mapped disk backend, and/or potentially to use succinct data structures (or alternatives).
Effort/Risk
Doesn’t offer any new features
Joins and potentially the entire result will need to fit into memory.
We have Aggregation decorators in progress.
Crux currently support point-in-time and limited history and history range queries* In many bitemporal papers they go far beyond this allowing for more complex queries across both time lines.
Examples:
'When did I change my address?'
'When did a person first earn more then X amount'
'Show me all fields that have been corrected over X time-frame'
(consult Jensen bitemp resources for more examples)
Opens up far more auditing capability
Bolsters 'bitemporal' claim
Z Curve index opens this up
Hard to reason about and adapt queries for what end-users want
Difficult to build
The straightforward approach is to simply assigning and managing document partitions intelligently.
Query nodes will need less disk.
Query nodes faster to index (subset vs full population)
Possibly complements Matrix (shared index chunks)
Complicates Crux.
Actual feature request might not arise
The simple version needs more complexity and virtual partitions to allow re-sharding.
Blurs the line between Crux, stream processing and rule engines a bit.
Opens up a world of possibilities.
Simple versions of this can be built around the transaction log and re-running queries.
Potentially use differential dataflow in an unbundled way as long as our log provides enough information.
Modelling it into the core would be intrusive* Could potentially be done when we introduce matrix, but it’s an orthogonal problem.
Using dataflow might lead to two different query engines with subtle differences unless we move it to the core.
Needs us to think properly about how we related streaming, which like times series might lead to taking on too much scope in Crux, or taking on areas we don’t really understand.
Enable rule engine and declarative programming on top of Crux itself* This is a potential alternative to transaction functions.
A lot of exciting possibilities
Likely to be some low-hanging fruit
Could lead to a Dedalus style programming model where you implement parts of your system in Crux’s Datalog directly* A rule engine
Trying to turn Crux into a rule engine might derail it from its primary bitemporal purpose
Transaction functions needs to be constrained and managed in various ways* Should be possible to evict
The non-decorator parts of Crux could be rewritten in another language, most likely Rust, for both speed and memory safety.
The core of Crux is using UnsafeBuffers and JNR and is often fighting Clojure and to a lesser extent the JVM to do things that would be comparatively straight forward in the right language.
Right tool for the job.
Control over speed* Opens up for SIMD and vectorization* Potentially no GC* Better interaction with native libraries.
Better memory safety if using Rust or similar language than when using sun.misc.Unsafe.
Potential to be used as a library beyond the JVM world.
Would still need to be embedded into Crux unless we rewrite the entire thing, which isn’t necessarily a good idea unless we find the right boundaries, as we want to keep the benefit of Clojure in the higher layers.
While Rust is the likely candidate, others might be more suitable in practice, so we need to spend time choosing the right tool.
We’re not experts in these languages.
We also need to understand their ecosystem, and how to deploy and debug such languages and also how to manage them operationally.
The feedback loop and ability to iterate is lower, so more likely to go stale.
The language needs to be fast, have zero or low-cost interaction with C libraries, and make managing memory easy, correct and cheap* No GC is preferable, but not a strict requirement* It also needs to be easy to call from Clojure and Java over JNI/JNR.
It should preferably be easy to work with and have reasonable feedback loops and compilation times.
Alternatives to Rust could maybe be Common Lisp, Ada, OCaml, D or even C itself combined with the right verification tools* We also have Java on Graal, and could double down on using native images.
Second tier languages that might also be worth considering are Go and Haskell, but they lack some of the desired requirements.
Can you improve this documentation? These fine people already did:
Jon Pither & Jeremy TaylorEdit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close