Liking cljdoc? Tell your friends :D

Benchmarks

This is the canonical performance document for the project: benchmark methodology, current numbers, and the implementation details that most directly explain those numbers.

Running

$ lein bench                  # Criterium, N=100K (~5 min)
$ lein bench --full           # Criterium, N=10K,100K,500K (~30 min)
$ lein bench --readme --full  # README tables only (~10 min)
$ lein bench --sizes 50000    # Custom sizes

$ lein bench-simple           # Quick iteration bench (100 to 100K)
$ lein bench-simple --full    # Full suite (100 to 1M)
$ lein bench-range-map        # Range-map vs Guava TreeRangeMap
$ lein bench-parallel         # Parallel threshold crossover analysis
$ lein bench-rope-tuning      # Rope chunk-size sweep

Results are written to bench-results/<timestamp>.edn with system info, sizes, and per-operation Criterium statistics.

Environment

| | | |--|--| | JVM | OpenJDK 25.0.2, 64-bit Server VM | | Clojure | 1.12.4 | | OS / Arch | Mac OS X 26.3.1 / aarch64 | | Available processors | 12 | | Heap | 8192 MB | | Method | Criterium quick-benchmark (6 samples, JIT warmup, outlier detection) |

Relative ratios are more meaningful than absolute times.

Set Operations

Two sets of size N with 50% overlap. Adams' divide-and-conquer with optional fork-join parallelism. Set algebra uses operation-specific root thresholds:

  • union 131,072
  • intersection 65,536
  • difference 131,072
  • ordered-map merge 65,536

Recursive re-forking currently uses 65,536 for all four operations, plus a 65,536 minimum-branch guard and a 64 sequential cutoff for tiny subtrees.

This is the library's dominant advantage. The split/join structure gives work-optimal set algebra, and fork-join parallelism helps extend that advantage to larger multicore workloads.

vs sorted-set (speedup)

OperationN=10KN=100KN=500K
Union15.4x26.4x56.6x
Intersection9.0x17.0x36.2x
Difference9.6x22.1x50.2x

vs data.avl (speedup)

OperationN=10KN=100KN=500K
Union10.9x20.5x42.1x
Intersection7.2x13.0x28.1x
Difference7.2x12.7x32.0x

Interpretation:

  • against sorted-set, the gap is mainly algorithmic: generic clojure.set paths over built-in sorted collections do not exploit a native split/join algebra
  • against data.avl, both libraries benefit from ordered trees, but this library's split/join constant factors are lower and the set operations also parallelize

vs clojure.set on hash-set (exploratory, unfair baseline)

This is not an ordered-collection comparison, so it should be read as an exploratory stress test rather than as the main benchmark story. Even so, the current split/join implementation still wins decisively.

OperationN=10KN=100KN=500K
Union4.2x7.2x16.3x
Intersection3.8x6.1x12.9x
Difference4.4x7.6x18.6x

Raw times (ms)

OperationNsorted-setdata.avlordered-set
Union10K3.632.560.24
100K42.7133.271.62
500K246.60183.404.36
Intersection10K2.321.850.26
100K30.3623.251.79
500K170.02131.964.70
Difference10K2.021.510.21
100K30.7017.711.39
500K156.97100.243.13

Raw times vs clojure.set on hash-set (ms)

OperationNclojure.set/hash-setordered-set
Union10K1.000.24
100K11.651.62
500K70.934.36
Intersection10K0.980.26
100K10.851.79
500K60.704.70
Difference10K0.910.21
100K10.511.39
500K58.123.13

Fold (r/fold)

Parallel fold via r/fold. The tree is split into equal subtrees and folded in parallel. sorted-set and data.avl fall back to sequential reduce.

Implementation note: CollFold is not just delegated blindly to r/fold. node-fold splits the tree eagerly in the caller thread and then folds chunk indices in parallel. That keeps split overhead under control and avoids depending on dynamic bindings inside worker tasks.

| | N=10K | N=100K | N=500K | |--|------:|-------:|-------:| | sorted-set | 0.33ms | 3.02ms | 16.37ms | | data.avl | 0.76ms | 4.34ms | 35.04ms | | ordered-set | 0.13ms | 0.73ms | 3.97ms | | vs sorted-set | 2.5x | 4.1x | 4.1x | | vs data.avl | 5.8x | 5.9x | 8.8x |

Construction

Batch from collection (parallel fold + union).

This is why constructor-based bulk loading is the right path to benchmark and the right path to use. Sequential conj is a different workload and is covered separately below.

| | N=10K | N=100K | N=500K | |--|------:|-------:|-------:| | sorted-set | 4.65ms | 75.02ms | 542.18ms | | data.avl | 2.33ms | 35.16ms | 291.11ms | | ordered-set | 1.55ms | 26.34ms | 172.21ms | | vs sorted-set | 3.0x | 2.8x | 3.1x | | vs data.avl | 1.5x | 1.3x | 1.7x |

Split

100 splits at random keys. Weight-balanced trees have lower constant factors — no height recomputation.

This is one of the cleanest demonstrations of the representation choice. Weight composes trivially after join; AVL trees must recompute heights bottom-up.

| | N=10K | N=100K | N=500K | |--|------:|-------:|-------:| | data.avl | 0.45ms | 0.68ms | 0.96ms | | ordered-set | 0.07ms | 0.09ms | 0.12ms | | vs data.avl | 6.8x | 7.2x | 7.8x |

Lookup

10K random lookups. These are all in the same practical performance tier; small differences are not especially meaningful compared with the much larger wins in set algebra, split/join-derived operations, construction, and fold.

Treat these as near-parity numbers, not as a headline differentiator.

Set (ms)

| | N=10K | N=100K | N=500K | |--|------:|-------:|-------:| | sorted-set | 1.47 | 2.14 | 2.89 | | data.avl | 1.51 | 2.13 | 2.85 | | ordered-set | 1.42 | 2.58 | 2.94 |

Map (ms)

| | N=10K | N=100K | N=500K | |--|------:|-------:|-------:| | sorted-map | 1.55 | 2.21 | 3.12 | | data.avl | 1.45 | 2.11 | 2.85 | | ordered-map | 1.21 | 1.87 | 2.71 |

Set Equality

Randomized integer sets, measured in isolation with lein bench-simple.

This is not one of the library's headline capabilities, but it is a useful sanity check for ordered-set's direct tree comparison path. The most meaningful cases are:

  • equal sets of the same size
  • same-size sets differing in one element

The cardinality-mismatch case is effectively a count check and is not very interesting.

Equal sets

| | N=1K | N=10K | N=100K | |--|-----:|------:|-------:| | hash-set | 0.03ms | 0.28ms | 3.66ms | | sorted-set | 0.10ms | 1.19ms | 14.80ms | | data.avl | 0.13ms | 1.40ms | 15.10ms | | ordered-set | 0.01ms | 0.10ms | 1.59ms | | vs hash-set | 3.4x | 2.8x | 2.3x | | vs sorted-set | 11.3x | 12.0x | 9.3x | | vs data.avl | 14.0x | 14.1x | 9.5x |

Interpretation:

  • by 1K, ordered-set is already competitive to clearly faster on equal-set comparison
  • by 10K, ordered-set's direct ordered comparison path is clearly better
  • by 100K, ordered-set is still several times faster than hash-set and about an order of magnitude faster than sorted-set and data.avl

Same size, one different element

| | N=1K | N=10K | N=100K | |--|-----:|------:|-------:| | hash-set | 0.01ms | 0.10ms | 0.33ms | | sorted-set | 0.10ms | 1.20ms | 14.93ms | | data.avl | 0.11ms | 1.36ms | 15.07ms | | ordered-set | 0.01ms | 0.10ms | 1.09ms |

Interpretation:

  • hash-set still wins the unequal cases at larger sizes
  • ordered-set still substantially beats sorted-set and data.avl
  • ordered-set is still in the same practical tier at 10K, then well ahead of the ordered competitors by 100K

This is a good example of where direct ordered traversal helps beyond set algebra itself: once sets are large enough, the library can compare two compatible ordered sets very efficiently without falling back to generic membership-oriented equality work.

Last Element

1000 calls. O(log n) via java.util.SortedSet.last() vs O(n) seq traversal.

This is endpoint access, not a claim about clojure.core/last.

| | N=10K | N=100K | |--|------:|-------:| | sorted-set | 424ms | 5,085ms | | data.avl | 538ms | 5,376ms | | ordered-set | 0.12ms | 0.13ms |

~40,000x faster at N=100K. Gap grows linearly with N.

Iteration (reduce)

Both ordered-collections and data.avl implement direct tree reduction paths, which is why both are much faster than seq-driven reduction over sorted-set.

| | N=10K | N=100K | N=500K | |--|------:|-------:|-------:| | sorted-set | 0.31ms | 3.43ms | 17.81ms | | data.avl | 0.06ms | 0.95ms | 5.31ms | | ordered-set | 0.07ms | 0.98ms | 4.84ms |

4–5x faster than sorted-set at larger sizes. On par with data.avl.

Insert (sequential conj, not batch)

This section is deliberately separate from construction. It measures repeated single-element mutation, not bulk loading.

| | N=10K | N=100K | N=500K | |--|------:|-------:|-------:| | sorted-set | 6.37ms | 71.62ms | 1,220ms | | data.avl | 5.25ms | 49.26ms | 983ms | | ordered-set | 4.61ms | 44.81ms | 906ms |

Delete

| | N=10K | N=100K | N=500K | |--|------:|-------:|-------:| | sorted-set | 3.30ms | 36.43ms | 621.05ms | | data.avl | 1.91ms | 26.13ms | 482.01ms | | ordered-set | 1.82ms | 22.90ms | 461.38ms |

Positional Access

Both O(log n). sorted-set has no nth/rank.

These are enabled by subtree sizes. For this library, size is part of the core tree invariant rather than an add-on feature.

nth (10K accesses by index)

| | N=10K | N=100K | N=500K | |--|------:|-------:|-------:| | data.avl | 0.85ms | 1.73ms | 2.83ms | | ordered-set | 1.40ms | 2.00ms | 2.80ms |

data.avl is usually faster, but the gap narrows substantially at 500K.

rank-of (10K lookups)

| | N=10K | N=100K | N=500K | |--|------:|-------:|-------:| | data.avl | 2.39ms | 3.49ms | 6.03ms | | ordered-set | 1.15ms | 2.62ms | 4.99ms |

ordered-set is ~1.2-2.1x faster in these measurements.

Interval Collections

Construction (ms)

| | N=10K | N=100K | N=500K | |--|------:|-------:|-------:| | interval-set | 9.3 | 191.9 | 1,367.3 | | interval-map | 10.4 | 233.0 | 1,667.2 |

Lookup (1K overlap queries, ms)

| | N=10K | N=100K | N=500K | |--|------:|-------:|-------:| | interval-map | 61.4 | 80.3 | 94.9 |

Sub-linear growth — O(log n + k) means query time depends more on result count than collection size.

Map Construction

| | N=10K | N=100K | N=500K | |--|------:|-------:|-------:| | sorted-map | 3.14ms | 55.25ms | 436.80ms | | data.avl | 2.37ms | 37.41ms | 335.96ms | | ordered-map | 1.45ms | 28.81ms | 191.14ms |

2.3x faster than sorted-map and 1.8x faster than data.avl at N=500K in these measurements.

Rope vs PersistentVector

The rope is a persistent sequence optimized for structural editing. Where PersistentVector is O(n) for mid-sequence splice/insert/remove, the rope is O(log n). The advantage is unbounded and grows linearly with collection size.

Structural Editing

WorkloadN=10KN=100KN=500K
200 random edits — rope2.6ms2.2ms2.9ms
200 random edits — vector97ms1.04s5.4s
Speedup38x473x1862x
WorkloadN=10KN=100KN=500K
Single splice — rope128µs54µs49µs
Single splice — vector842µs6.0ms27ms
Speedup7x111x551x

At 500K elements, 200 random splice operations take ~3ms on the rope vs ~5.4 seconds on the vector. The rope's time is nearly constant across sizes because each operation is O(log n).

Concatenation

WorkloadN=10KN=100KN=500K
Concat many pieces — rope36µs123µs364µs
Concat many pieces — vector102µs800µs3.4ms
Speedup3x7x9x

Bulk concatenation collects chunks in O(total chunks) and builds the tree directly, avoiding pairwise tree operations.

Reduce

WorkloadN=10KN=100KN=500K
Reduce sum — rope145µs601µs2.8ms
Reduce sum — vector93µs617µs3.6ms
Ratio0.6x~1x1.3x

The rope beats vectors on reduce at N >= 100K because 256-element chunks give better cache locality per reduction step than PersistentVector's 32-wide trie nodes. The rope uses a direct in-order tree walk (no enumerator frames) and delegates to the vector's native .reduce for chunk-internal iteration.

Parallel Fold

WorkloadN=10KN=100KN=500K
Fold sum — rope0.22ms0.31ms0.85ms
Fold sum — vector1.23ms0.47ms1.13ms
Speedup5.6x1.5x1.3x

The rope's r/fold uses tree-based fork-join decomposition — split at the midpoint, fork left, compute right inline, join. This maps directly onto the ForkJoinPool work-stealing model without a separate chunking pass. At small N the speedup is largest because the rope's tree structure provides immediate parallelism while the vector's fold has higher setup overhead.

Rope vs String (text workload)

For text-editing workloads, the rope also beats java.lang.String at scale:

Workload (N=100K chars)RopeStringSpeedup
Splice 32 chars at midpoint3.6µs13.4µs3.8x
Split at midpoint425ns2.6µs6.1x

String splice and split copy the entire string (O(n)); the rope does O(log n) tree work.

Where the Rope Loses

WorkloadRatioWhy
Split / slice~20x slowerO(log n) vs O(1) subvec wrapper
Random nth0.4-0.7xO(log n) vs O(1) trie lookup
Build via transient2-3x slowerPeriodic O(log n) tree flush vs O(1) array append

These losses are inherent to the tree-backed design, not to the implementation. The absolute times are microseconds.

Specialized Node Types

Primitive-specialized node types reduce boxing and comparison overhead for common homogeneous-key cases:

(long-ordered-set data)     ;; unboxed long keys
(double-ordered-set data)   ;; unboxed double keys
(string-ordered-set data)   ;; String.compareTo

These are constant-factor optimizations on top of the same tree algebra, not a separate implementation strategy.

Can you improve this documentation?Edit on GitHub

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts
Ctrl+kJump to recent docs
Move to previous article
Move to next article
Ctrl+/Jump to the search field
× close