Liking cljdoc? Tell your friends :D

Changelog

[0.2.1] - 2026-04-17

New Collection Types

  • StringRope (string-rope) — persistent chunked text sequence backed by java.lang.String chunks. Implements java.lang.CharSequence so it drops into re-find/re-seq/re-matches, clojure.string, and any Java API expecting text. Equality with String is content-based and hash-compatible. #string/rope "…" tagged literal with EDN round-trip. Constructor: string-rope / string-rope-concat. At 100K+ characters, up to ~38x faster than String on repeated structural edits, growing to ~130x at 500K.
  • ByteRope (byte-rope) — persistent chunked binary sequence backed by byte[] chunks. Unsigned byte semantics (0–255 as long). Unsigned lexicographic Comparable via Arrays.compareUnsigned. #byte/rope "hex" tagged literal. Constructor: byte-rope / byte-rope-concat. Extras: byte-rope-bytes, byte-rope-hex, byte-rope-write, byte-rope-input-stream, byte-rope-get-byte/-short/-int/-long (plus -le variants), byte-rope-index-of, and a streaming byte-rope-digest that feeds chunks through java.security.MessageDigest without materialization.

Rope Family Improvements

  • Flat-mode optimization for all three rope variants (rope, string-rope, byte-rope). When a rope's element count is at or below the per-variant flat threshold (1024 elements, characters, or bytes), the rope stores its content as a bare concrete collection (PersistentVector, java.lang.String, or byte[]) directly in the root field, skipping the tree wrapper entirely. Reads dispatch straight to the native type with zero indirection overhead; edits that grow the rope past the threshold transparently promote to chunked tree form; transients demote back to flat form at persistent! time when the result fits. Memory for small ropes is essentially identical to the natural baseline (1.00x vs PersistentVector / String / byte[]). StringRope and ByteRope had this from day one; the generic Rope gained it late in the 0.2.1 cycle so all three variants now share the same optimization pattern.
  • Per-variant Chunk Size Invariant (CSI) — each rope variant now declares its own +target-chunk-size+ / +min-chunk-size+ constants and binds them via its with-tree macro into the kernel's new *target-chunk-size* / *min-chunk-size* dynamic vars. Tuned via lein bench-rope-tuning: all three variants default to 1024/512 (up from the historical 256/128). At 500K elements, generic Rope gains +41% nth, +38% split, and 5x concat; StringRope and ByteRope improve on every measured operation.
  • kernel/chunk.clj — extracted from kernel/rope.clj. Holds the PRopeChunk protocol extensions for the three chunk backends (APersistentVector, String, byte[]) as a standalone kernel submodule. kernel/rope.clj drops from 1237 to 1155 lines and is now purely the rope tree algebra.
  • StringRope internals refactorwith-tree macro replaces 16+ copies of the (binding [*t-join* alloc] ...) form; ->StringRope* helper replaces 35+ copies of the 6-arg constructor; coll->str and coll->tree-root coercion helpers deduplicate scattered dispatch logic in the PRope method bodies.
  • Monomorphic hot paths for nth and reduce on all three rope variants. Each variant's deftype now inlines the tree walk directly, replacing the generic kernel's protocol-dispatched rope-nth / rope-chunk-at / rope-reduce with concrete chunk-type calls (alength/aget for byte[], .length/.charAt for String, .count/.nth for vector). Eliminates per-tree-level PRopeChunk protocol dispatch (~9 dispatches per nth at N=500K), the [chunk offset] tuple allocation that rope-chunk-at returned on every call, and per-chunk chunk-reduce-init dispatch on every leaf during reduce. Measured at N=500K (1000 random nth, full reduce):
    • Rope nth: 106 → 58 µs (1.8x faster, 0.09x → 0.16x vs vector)
    • StringRope nth: 120 → 50 µs (2.4x faster, 0.013x → 0.030x vs String)
    • ByteRope nth: 145 → 62 µs (2.3x faster, 0.003x → 0.015x vs byte[])
    • StringRope reduce: 1.81 → 1.07 ms (1.7x faster, 0.31x → 0.52x vs String)
    • ByteRope reduce: 3.53 → 1.91 ms (1.8x faster)
    • No structural-op regression: splice, concat, insert, remove, and repeated-edits all within ±3% of prior run.
  • Removed cursor cache from StringRope and ByteRope. The volatile-mutable _cc_chunk/_cc_start/_cc_end fields introduced torn-read races under concurrent access (three volatile writes are not atomic as a group) and caused cache thrashing when two threads did sequential access on the same rope instance — violating the thread-safety guarantees expected of persistent data structures. The monomorphic tree walk is fast enough (~50–70 ns per nth at N=500K) that the cache's benefit on sequential access was not worth the correctness cost. If sequential charAt throughput becomes a bottleneck for regex-heavy workloads, an explicit cursor wrapper (opt-in, not shared) may be added in a future release.
  • rope-splice-inplace fused single-chunk splice path avoids an intermediate chunk-splice allocation on the overflow path via chunk-splice-split.

Performance Improvements

  • Primitive rank for long-ordered-set / string-ordered-set / long-ordered-map / string-ordered-map. rank-of and indexOf now dispatch to node-rank-long / node-rank-string on primitive- specialized collections, bypassing the generic Comparator dispatch. Matches the existing primitive fast-path pattern already used for contains / find / find-val. At N=1K, rank on a long-ordered-set is ~4x faster than on a data.avl/sorted-set; string-ordered-set rank is ~3.4x faster.
  • Range-map bulk construction. (range-map coll) with sorted disjoint input now takes an O(n) balanced-build path (node-build- sorted) instead of per-entry assoc with carve-out. Input with overlapping ranges falls through to the general carving path, preserving "later wins" semantics. ~10x faster than the previous per-insert path; at N=1K the bulk path is already ~2.2x faster than Guava's TreeRangeMap construction.
  • Non-allocating java.util.Iterator for OrderedSet / OrderedMap. A new tree/NodeIterator deftype advances the tree enumerator in place via an unsynchronized-mutable field, avoiding the per-step seq-cell allocation of clojure.lang.SeqIterator over a lazy seq. Thread-safety contract is unchanged: the iterator is per-call fresh (no shared state on the collection), matching the memory model of SeqIterator. Java-style iteration on OrderedSet is now ~2x faster than on sorted-set and ~3.6x faster than on data.avl at N=1K.

Refactoring

  • RopeSeq / RopeSeqReverse moved from kernel/rope.clj into types/rope.clj. These generic-Rope-specific seq types were only used by the generic Rope deftype — StringRope and ByteRope carry their own monomorphic seq types. Relocating them makes kernel/rope.clj honestly chunk-protocol-agnostic and cuts ~220 lines from the kernel (now 1001 lines). No user-visible change.

Bug Fixes

  • Primitive node specialization preserved across mutations. conj/disjoin on OrderedSet and assoc/without on OrderedMap were passing the generic SimpleNode constructor instead of the collection's stored allocator. After a single conj on a long-ordered-set, the root silently degraded from LongKeyNode to SimpleNode, losing unboxed-key performance. Fixed by threading alloc through all node-add/node-remove call sites. ordered-merge-with also propagated nil alloc/stitch into the result; fixed.
  • PriorityQueue and OrderedMultiset getAllocator and getStitch returned nil instead of the generic constructor, violating the INodeCollection/IBalancedCollection contract.
  • Empty StringRope charAt and nth dereferenced nil root instead of throwing bounds exceptions.
  • StringRope valAt threw ClassCastException on non-integer keys (e.g., (get sr :x)). Added integer? guard.
  • Empty StringRope/ByteRope r/fold crashed instead of returning (combinef).
  • ByteRope InputStream.read(buf, off, 0) returned -1 at EOF instead of 0 per InputStream contract.
  • Auto-boxing in str->root and bytes->root. The loop variable pos was inferred as primitive long but the recur argument came from clojure.core/min (Object) and unchecked-dec-int (int), forcing auto-boxing per iteration. Threaded as primitive long throughout using unchecked-add / unchecked-dec / unchecked-int consistently. Pre-existing latent warning exposed when compiling under *warn-on-reflection*.

Benchmarks and Tooling

  • lein bench-rope-tuning fully rewritten to sweep chunk sizes across all three rope variants (Rope vs Vector, StringRope vs String, ByteRope vs byte[]). Reports per-operation speedups and a geomean score for ranking. Supports --variant rope|string-rope|byte-rope.
  • lein bench (bench_runner.clj) full suite gains N=1000 and N=5000 cardinalities alongside the existing 10K/100K/500K. The 1K column exercises flat-mode for all three rope variants; the 5K column exercises the smallest tree-mode regime.
  • lein bench-simple gains a :rope category (alongside the existing :string-rope and new :byte-rope categories) and adds N=5000 to the shared size defaults.
  • Memory test (memory_test.clj) gains string-rope-memory and byte-rope-memory deftests plus a new rope family section in the summary report table, showing all three variants against their natural baselines. The specialized-collection-memory deftest extends to cover range-map, segment-tree, and fuzzy-map (previously only interval-set/-map, multiset, priority-queue, and fuzzy-set).
  • lein bench-report gains three new sections: Performance by Category (aggregated wins/parity/losses per category with geomean speedup and best/worst case), Rope Family at Scale (side-by-side speedups for all three rope variants on structural ops), and Significant Wins (parallel to the existing Significant Losses section — the significant-wins analyzer was always computed but previously not rendered). All existing sections — Headline Performance, Parity, Significant Losses, Full Scorecard, Regressions, Improvements — render identically.
  • lein bench auto-compare — after writing a fresh bench-results/<timestamp>.edn, the runner looks for the most-recent prior EDN in the same directory, flat-walks both files, matches leaf measurements by (size, group, variant), and prints a compact Regressions / Improvements section with timing deltas. Self-contained (no dependency on the bb report tool); suggests lein bench-report --baseline for the full comparison.
  • Main bench suite coverage paritybench_runner.clj now benchmarks range-map, segment-tree, priority-queue, ordered-multiset, fuzzy-set, and fuzzy-map alongside the existing set / map / rope coverage. Previously these types were only exercised by specialized scripts (lein bench-range-map) or not at all, which meant the main lein bench --full pipeline and bench-report had no visibility into their performance.
  • lein bench-charts generates 7 PNG charts in doc/charts/ from the latest benchmark EDN via XChart. Charts: set-algebra scaling, rope editing scaling, collection winners (dot plot), rope operations profile (win/loss), rope vs vector absolute time (diverging lines), StringRope crossover, ByteRope crossover.
  • lein bench-report auto-baseline — when --baseline is not specified, the report automatically selects the prior timestamped EDN, so Regressions and Improvements sections render by default. Headline sections now include ordered-set, ordered-map, long-specialized, and string-specialized vs their competitors.
  • Rope tuner scoringlein bench-rope-tuning now uses structural-editing geomean (splice, split, concat) as the primary score, with the equal-weight geomean shown as a secondary all column. The old equal-weight geomean was misleadingly driven by concat scaling.
  • lein bench-report --publish suppresses the Full Scorecard, Regressions, and Improvements sections. These are useful for interactive A/B review during development but are noise for outside readers of the committed doc/report.txt snapshot. The default (no flag) still shows everything. Recommended snapshot workflow: lein bench-report --publish > doc/report.txt.
  • New bench cases exercising the primitive-rank / range-map-bulk / iterator optimizations. bench-long-rank-lookup and bench-string-rank-lookup hit the primitive node-rank-long / node-rank-string paths that the generic bench-rank-lookup missed. bench-range-map-bulk-construction uses the single-argument (core/range-map coll) constructor to exercise the new O(n) balanced-build path alongside the existing per-insert bench-range-map-construction. bench-set-iteration-iterator traverses via .iterator() to exercise NodeIterator (the existing bench-set-iteration goes through reduce).

Documentation

  • Cookbook restructured with six rope recipes at the front (text editor, regex on StringRope, bulk sequence assembly, binary protocol, streaming digest, undo history). Duplicate section numbering cleaned up; existing collection recipes renumbered.
  • Ropes gains a "Chunk Abstraction: One Kernel, Many Backends" section explaining PRopeChunk and kernel/chunk.clj, a "Specialized Ropes" section with per-variant design and examples, and a variant-picker table. API section now covers all three variants with the shared PRope surface up front.
  • Collections API gains full StringRope and ByteRope sections with constructors, interfaces, and per-variant operations.

[0.2.0] - 2026-04-08

New Collection Types

  • Rope (rope) — persistent chunked sequence for O(log n) structural editing (concat, split, splice, insert, remove). Backed by a weight-balanced tree of chunk vectors with a formal Chunk Size Invariant. Up to 1968x faster than PersistentVector on repeated random edits at 500K elements; 3-10x faster on concatenation; 1.3-1.7x faster on reduce at scale. Includes structure-sharing subrange views via rope-sub, transient support for batch construction, parallel r/fold, java.util.List interop, and lexicographic Comparable.
  • Range Map (range-map) — non-overlapping [lo, hi) ranges with automatic carve-out on insert
  • Segment Tree (segment-tree, sum-tree, min-tree, max-tree) — O(log n) range aggregation with any associative operation
  • Priority Queue (priority-queue) — O(log n) push/peek/pop with min and max access
  • Ordered Multiset (ordered-multiset) — sorted bag allowing duplicate elements
  • Fuzzy Set/Map (fuzzy-set, fuzzy-map) — nearest-neighbor lookup by distance with configurable tiebreaking

New Operations

  • Set algebra: union, intersection, difference, subset?, superset?, disjoint?
  • Positional: rank, slice, median, percentile
  • Navigation: nearest (floor/ceiling with keyword tests :<=, :>=, :<, :>), subrange, split-key, split-at
  • Interval: overlapping, span
  • Range map: ranges, span, gaps, assoc-coalescing, get-entry, range-remove
  • Segment tree: query, aggregate, update-val, update-fn
  • Priority queue: push, push-all, peek-min, peek-min-val, pop-min, peek-max, peek-max-val, pop-max
  • Multiset: multiplicity, disj-one, disj-all, distinct-elements, element-frequencies
  • Fuzzy: fuzzy-nearest, fuzzy-exact-contains?, fuzzy-exact-get
  • Rope: rope-concat, rope-concat-all, rope-split, rope-sub, rope-insert, rope-remove, rope-splice, rope-chunks, rope-chunks-reverse, rope-chunk-count, rope-str
  • Map: assoc-new, ordered-merge-with
  • Comparator: general-compare — opt-in total order over all values including non-Comparable types (Namespace, Var, etc.). ~20% slower lookups on Comparable types vs default.

Specialized Constructors

  • Rope: rope
  • Type-specific: long-ordered-set, long-ordered-map, double-ordered-set, double-ordered-map, string-ordered-set, string-ordered-map
  • Custom comparator: ordered-set-by, ordered-map-by, ordered-set-with, ordered-map-with, ordered-multiset-by, fuzzy-set-by, fuzzy-map-by, segment-tree-by, segment-tree-with
  • Exported comparators: long-compare, double-compare, string-compare, general-compare, compare-by

Interface Implementations

  • clojure.lang.Sorted — native subseq/rsubseq on ordered-set and ordered-map
  • clojure.core.reducers/CollFold — chunked parallel fold; ordered-set/map, rope, and compatible tree-backed types split into larger chunks before delegating to r/fold
  • clojure.lang.IEditableCollection / ITransientCollection — transient support for Rope with mutable tail buffer for efficient batch construction
  • clojure.core.protocols/CollReduce — implemented directly on all collection deftypes for correct fast-path reduction
  • clojure.lang.IHashEq — correct hash for use in hash-based collections
  • java.io.Serializable — Java serialization support
  • IReduceInit/IReduce — direct tree traversal for fast reduce
  • Direct ISeq implementations (KeySeq, EntrySeq) replace lazy-seq wrappers

EDN Tagged Literals

Round-trip serialization via data_readers.clj: #ordered/set, #ordered/map, #interval/set, #interval/map, #range/map, #priority/queue, #multi/set, #vec/rope. Collections with custom comparators (including general-compare) print in opaque #<Type ...> form to avoid non-round-trippable tagged literals.

Performance

  • Parallel set operations via ForkJoinPool with operation-specific root thresholds (65,536–131,072), 65,536 recursive thresholds, and 64 sequential cutoff
  • Primitive node types (LongKeyNode, DoubleKeyNode) — unboxed key storage
  • Primitive lookup fast pathlong-ordered-set bypasses Comparator dispatch
  • Interval overlapintersects? reduced from up to 12 comparisons to 2 (closed-interval identity: a0 <= b1 AND a1 <= b0)
  • Reduction refactor — unary reducers (nodes, keys, entries) share a single enumerator-based kernel; kv reducers remain separate to avoid packing overhead. All support reduced short-circuiting.
  • Fold benchmarking includes a non-trivial frequency-map workload comparing ordered-set fold against hash-set reduce, sorted-set fold/reduce, and data.avl fold/reduce
  • Benchmark/test infrastructure shares common workload generators, reference helpers, and competitor builders via test-utils and bench-utils

See benchmarks for current numbers and analysis.

Build

  • Namespace root is now ordered-collections.* / ordered_collections.* rather than com.dean.ordered-collections.*
  • Source/test tree lives under src/ordered_collections and test/ordered_collections
  • lein stats — babashka-based project statistics report (clj-kondo analysis, git churn, codebase metrics)

Bug Fixes

  • SortedSet.tailSet now returns elements >= x (was exclusive)
  • SortedSet.subSet now returns elements >= from, < to
  • Interval tree construction uses sequential reduce (parallel fold lost dynamic binding for node allocator at >2048 elements)
  • segment-tree range queries are generic over ordered keys rather than assuming integer-only query bounds
  • general-compare collections print opaque (#<OrderedSet ...>) rather than emitting tagged literals that cannot round-trip through EDN
  • Priority queue uses direct seq adapters instead of lazy map wrappers; stronger coverage for duplicate-priority ordering and boundary cases

Documentation

  • New Ropes — rope tutorial, use cases, and design
  • New Collections API — per-type constructor and operation reference
  • Migration guide corrected (oc/rank not oc/rank-of)
  • Performance documentation consolidated in doc/benchmarks.md; the older perf-analysis.md and when-to-use.md are removed
  • Cookbook examples refreshed for current semantics
  • Generated doc/api output is no longer tracked in the repository

Breaking Changes

  • Removed mutable-ordered-set, mutable-ordered-map, mutable-interval-set, mutable-interval-map
  • Removed transient/persistent! support (path-copying made it a no-op)
  • Renamed public function: rank-ofrank (returns nil instead of -1 for missing elements)
  • Public namespace root and Maven/Lein artifact coordinate are now ordered-collections

[0.1.2] - 2024

  • Documentation improvements
  • Minor bug fixes

[0.1.1] - 2024

  • Initial public release
  • Weight-balanced persistent binary trees
  • ordered-set, ordered-map, interval-set, interval-map
  • Efficient set operations (intersection, union, difference)
  • nth and indexOf in O(log n) time

Can you improve this documentation?Edit on GitHub

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts
Ctrl+kJump to recent docs
Move to previous article
Move to next article
Ctrl+/Jump to the search field
× close