Liking cljdoc? Tell your friends :D

Competitive Analysis: ordered-collections

Comparison with clojure.core/sorted-set, clojure.core/sorted-map, and clojure.data.avl.

Summary

Aspectordered-collectionsclojure.coreclojure.data.avl
Tree typeWeight-balancedRed-blackAVL
Set operationsO(m log(n/m+1)), parallelO(n) via clojure.setO(m log(n/m+1))
O(log n) nth/rankYesNoYes
Fast endpoint accessYesSeq last is O(n)Seq last is O(n)
Parallel foldYes (fork-join)NoNo
Interval treesYesNoNo
Segment treesYesNoNo
Range mapsYesNoNo
Fuzzy lookupYesNoNo
Memory/element~64 bytes~61 bytes~64 bytes

Algorithmic Differences

Set Operations

All three libraries support union, intersection, and difference.

clojure.core uses clojure.set/union etc., which iterate element-by-element. O(n log n) regardless of overlap.

In the current benchmark suite, ordered-collections also beats the unfair clojure.set + hash-set baseline on these workloads, but that comparison is best treated as exploratory rather than as the main competitive claim.

data.avl and ordered-collections both use Adams' divide-and-conquer algorithm: split one tree at the other's root, recurse on halves, join. O(m log(n/m + 1)) where m ≤ n — information-theoretically optimal. When one set is much smaller, this is dramatically better than linear.

ordered-collections additionally parallelizes the two independent recursive calls via ForkJoinPool, with granularity tuned from the benchmark suite. In practice, this yields very large wins over sorted-set and clear wins over data.avl on set algebra, especially as collections grow. See Benchmarks for current measurements.

Split and Join

| | clojure.core | data.avl | ordered-collections | |--|:---:|:---:|:---:| | split-key | — | O(log n) | O(log n) | | split-at | — | O(log n) | O(log n) | | join | — | O(log n) | O(log n) |

Both data.avl and ordered-collections expose split/join, but ordered-collections has materially lower constant factors in the current benchmark suite because weight composes trivially: weight(join(L, k, R)) = weight(L) + 1 + weight(R). AVL trees must recompute heights bottom-up after joining. Red-black trees must reconcile color invariants. This difference compounds in set operations, which call split/join at every level of recursion.

Positional Access

Both ordered-collections and data.avl track subtree sizes, enabling:

  • (nth coll i) — O(log n) vs O(n) for core sorted collections
  • (rank-of coll x) — 0-based index of element
  • (split-at i coll) — split at position

ordered-collections additionally provides slice, median, and percentile — all O(log n), all derived from nth.

First / Last

Operationclojure.coredata.avlordered-collections
(first coll)O(1)O(1)O(1)
java.util.SortedSet.last()O(log n)
seq-based lastO(n)O(n)O(n)

last on a 1M-element sorted-set scans the entire collection. ordered-collections also supports fast endpoint access through java.util.SortedSet.last(), which traverses only the right spine of the tree.

Parallel Fold

ordered-collections implements clojure.core.reducers/CollFold by splitting the tree into larger chunks and delegating those chunks to r/fold. Chunking follows the r/fold granularity provided by the caller rather than imposing a separate library-level floor.

sorted-set and data.avl fall back to sequential reduce.

Feature Comparison with data.avl

Featureordered-collectionsdata.avl
split-key / split-at
subrange
nearest (floor/ceiling)
nth / rank-of
slice / median / percentile
Parallel set operations
Parallel r/fold
Interval trees
Segment trees
Fuzzy sets/maps
Range maps
Priority queues
Multisets
EDN serialization
ClojureScript
Transient support

What data.avl has that we don't

ClojureScript support. ordered-collections is JVM-only (java.util.SortedSet, ForkJoinPool, etc.).

Transient support. data.avl supports transient/persistent! for batch mutation. ordered-collections uses persistent construction throughout, mitigated by parallel batch construction which is competitive in practice.

Memory

From memory_test.clj at N=100,000 (measured with clj-memory-meter):

CollectionBytes/Elementvs core
sorted-set60.61.00x
data.avl sorted-set64.01.06x
ordered-set64.01.06x
CollectionBytes/Entryvs core
sorted-map84.61.00x
data.avl sorted-map88.01.04x
ordered-map88.01.04x

The extra ~4 bytes/node is the weight field. Both data.avl and ordered-collections pay this cost for subtree size tracking.

Specialized Collections

Not available elsewhere in the Clojure ecosystem:

Interval trees — augmented with max-endpoint per subtree. O(log n + k) overlap queries for both point and interval queries:

(def events (oc/interval-set [[0 10] [5 15] [20 30]]))
(events [8 12])  ;=> ([0 10] [5 15])
(events 5)       ;=> ([0 10] [5 15])

Segment trees — pre-computed subtree aggregates. O(log n) range queries and updates with any associative operation:

(def st (oc/sum-tree {0 10, 1 20, 2 30, 3 40}))
(oc/query st 1 3)  ;=> 90

Range maps — non-overlapping [lo, hi) ranges. Inserting a range automatically carves out overlaps:

(def rm (oc/range-map {[0 10] :a [20 30] :b}))
(rm 5)   ;=> :a
(rm 15)  ;=> nil

Fuzzy sets/maps — nearest-neighbor by distance with configurable tiebreaking:

(def fs (oc/fuzzy-set [1.0 2.0 3.0 10.0]))
(fs 2.1)  ;=> 2.0

Can you improve this documentation?Edit on GitHub

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts
Ctrl+kJump to recent docs
Move to previous article
Move to next article
Ctrl+/Jump to the search field
× close