Comparison with clojure.core/sorted-set, clojure.core/sorted-map, and clojure.data.avl.
| Aspect | ordered-collections | clojure.core | clojure.data.avl |
|---|---|---|---|
| Tree type | Weight-balanced | Red-black | AVL |
| Set operations | O(m log(n/m+1)), parallel | O(n) via clojure.set | O(m log(n/m+1)) |
| O(log n) nth/rank | Yes | No | Yes |
| Fast endpoint access | Yes | Seq last is O(n) | Seq last is O(n) |
| Parallel fold | Yes (fork-join) | No | No |
| Interval trees | Yes | No | No |
| Segment trees | Yes | No | No |
| Range maps | Yes | No | No |
| Fuzzy lookup | Yes | No | No |
| Memory/element | ~64 bytes | ~61 bytes | ~64 bytes |
All three libraries support union, intersection, and difference.
clojure.core uses clojure.set/union etc., which iterate element-by-element. O(n log n) regardless of overlap.
In the current benchmark suite, ordered-collections also beats the unfair
clojure.set + hash-set baseline on these workloads, but that comparison is
best treated as exploratory rather than as the main competitive claim.
data.avl and ordered-collections both use Adams' divide-and-conquer algorithm: split one tree at the other's root, recurse on halves, join. O(m log(n/m + 1)) where m ≤ n — information-theoretically optimal. When one set is much smaller, this is dramatically better than linear.
ordered-collections additionally parallelizes the two independent recursive
calls via ForkJoinPool, with granularity tuned from the benchmark suite.
In practice, this yields very large wins over sorted-set and clear wins over
data.avl on set algebra, especially as collections grow. See
Benchmarks for current measurements.
| | clojure.core | data.avl | ordered-collections |
|--|:---:|:---:|:---:|
| split-key | — | O(log n) | O(log n) |
| split-at | — | O(log n) | O(log n) |
| join | — | O(log n) | O(log n) |
Both data.avl and ordered-collections expose split/join, but ordered-collections
has materially lower constant factors in the current benchmark suite because
weight composes trivially: weight(join(L, k, R)) = weight(L) + 1 + weight(R).
AVL trees must recompute heights bottom-up after joining. Red-black trees must
reconcile color invariants. This difference compounds in set operations, which
call split/join at every level of recursion.
Both ordered-collections and data.avl track subtree sizes, enabling:
(nth coll i) — O(log n) vs O(n) for core sorted collections(rank-of coll x) — 0-based index of element(split-at i coll) — split at positionordered-collections additionally provides slice, median, and percentile — all O(log n), all derived from nth.
| Operation | clojure.core | data.avl | ordered-collections |
|---|---|---|---|
(first coll) | O(1) | O(1) | O(1) |
java.util.SortedSet.last() | — | — | O(log n) |
seq-based last | O(n) | O(n) | O(n) |
last on a 1M-element sorted-set scans the entire collection. ordered-collections
also supports fast endpoint access through java.util.SortedSet.last(), which
traverses only the right spine of the tree.
ordered-collections implements clojure.core.reducers/CollFold by splitting
the tree into larger chunks and delegating those chunks to r/fold. Chunking
follows the r/fold granularity provided by the caller rather than imposing a
separate library-level floor.
sorted-set and data.avl fall back to sequential reduce.
| Feature | ordered-collections | data.avl |
|---|---|---|
split-key / split-at | ✓ | ✓ |
subrange | ✓ | ✓ |
nearest (floor/ceiling) | ✓ | ✓ |
nth / rank-of | ✓ | ✓ |
slice / median / percentile | ✓ | — |
| Parallel set operations | ✓ | — |
Parallel r/fold | ✓ | — |
| Interval trees | ✓ | — |
| Segment trees | ✓ | — |
| Fuzzy sets/maps | ✓ | — |
| Range maps | ✓ | — |
| Priority queues | ✓ | — |
| Multisets | ✓ | — |
| EDN serialization | ✓ | ✓ |
| ClojureScript | — | ✓ |
| Transient support | — | ✓ |
ClojureScript support. ordered-collections is JVM-only (java.util.SortedSet, ForkJoinPool, etc.).
Transient support. data.avl supports transient/persistent! for batch mutation. ordered-collections uses persistent construction throughout, mitigated by parallel batch construction which is competitive in practice.
From memory_test.clj at N=100,000 (measured with clj-memory-meter):
| Collection | Bytes/Element | vs core |
|---|---|---|
| sorted-set | 60.6 | 1.00x |
| data.avl sorted-set | 64.0 | 1.06x |
| ordered-set | 64.0 | 1.06x |
| Collection | Bytes/Entry | vs core |
|---|---|---|
| sorted-map | 84.6 | 1.00x |
| data.avl sorted-map | 88.0 | 1.04x |
| ordered-map | 88.0 | 1.04x |
The extra ~4 bytes/node is the weight field. Both data.avl and ordered-collections pay this cost for subtree size tracking.
Not available elsewhere in the Clojure ecosystem:
Interval trees — augmented with max-endpoint per subtree. O(log n + k) overlap queries for both point and interval queries:
(def events (oc/interval-set [[0 10] [5 15] [20 30]]))
(events [8 12]) ;=> ([0 10] [5 15])
(events 5) ;=> ([0 10] [5 15])
Segment trees — pre-computed subtree aggregates. O(log n) range queries and updates with any associative operation:
(def st (oc/sum-tree {0 10, 1 20, 2 30, 3 40}))
(oc/query st 1 3) ;=> 90
Range maps — non-overlapping [lo, hi) ranges. Inserting a range automatically carves out overlaps:
(def rm (oc/range-map {[0 10] :a [20 30] :b}))
(rm 5) ;=> :a
(rm 15) ;=> nil
Fuzzy sets/maps — nearest-neighbor by distance with configurable tiebreaking:
(def fs (oc/fuzzy-set [1.0 2.0 3.0 10.0]))
(fs 2.1) ;=> 2.0
Can you improve this documentation?Edit on GitHub
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |