Liking cljdoc? Tell your friends :D

# com.dean.interval-tree

This library provides a collection of data structures implemented using a modular, extensible, foldable, weight balanced persistent binary tree: ordered-sets, ordered-maps, interval-sets, and interval-maps.  ### Usage

To install, add the following dependency to your project or build file:

``````[com.dean/interval-tree "0.1.2"]
``````

#### Public API

The public api resides in the top-level `com.dean.interval-tree.core` namespace:

``````(require '[com.dean.interval-tree.core :as dean])
``````

The basic operation of this library is as a drop-in replacement for `clojure.core/sorted-set` and `clojure.core/sorted-map`.

#### Constructors

• `(dean/ordered-set coll)`
• `(dean/ordered-set-by pred coll)`
• `(dean/ordered-map coll)`
• `(dean/ordered-map-by pred coll)`
• `(dean/interval-set coll)`
• `(dean/interval-map coll)`

### Topics

#### What is an Interval Map?

Imagine you'd like to associate values with members of a set of intervals over some continuous domain such as time or real numbers. An example of this is shown below. An interval map answers the question, which intervals overlap at some point on the domain. At 3.14159, in this case, would be `x4` and `x7`. The interval map is sparse itself, of course, and would only need to contain the 8 constituent intervals.

`````` x8:                         +-----+
x7:                   +-----------------------------------+
x6:                                                       +
x5:                                     +-----------+
x4: +-----------------------------+
x3:                                                 +-----+
x2:                         +-----------------+
x1:       +-----------+

0=====1=====2=====3=====4=====5=====6=====7=====8=====9
``````

This corresponds to the following example code:

``````
(def x (dean/interval-map {[1 3] :x1
[4 7] :x2
[8 9] :x3
[0 5] :x4
[6 8] :x5
[9 9] :x6
[3 9] :x7
[4 5] :x8}))

(x 3.141592654) ;; =>  [:x4 :x7]
(x [5 5])       ;; =>  [:x4 :x7 :x8 :x2]

(get x 9)       ;; =>  [:x7 :x3 :x6]
(get x 9.00001) ;; =>  nil
(get x [1 4])   ;; =>  [:x4 :x1 :x7 :x8 :x2]

``````

#### Efficient Set Operations

This library implements a diverse collection of efficent set operations on foldably parallel ordered sets:

``````  (def foo (shuffle (range 500000)))
(def bar (shuffle (range 1000000)))

(def s0 (shuffle (range 0 1000000 2)))
(def s1 (shuffle (range 0 1000000 3)))

;;
;;; dean/ordered-set
;;

(time (def x (ordered-set foo)))         ;; 500K: "Elapsed time: 564.248517 msecs"
(time (def y (ordered-set bar)))         ;;   1M: "Elapsed time: 1187.734211 msecs"

(time (def s (dean/intersection
(ordered-set s0)
(ordered-set s1))))       ;; 833K: "Elapsed time: 1242.961445 msecs"

(time (r/fold + + y))                    ;;   1M: "Elapsed time: 54.363545 msecs"

;;
;;; clojure.core/sorted-set
;;

(time (def v (into (sorted-set) foo)))   ;; 500K: "Elapsed time: 839.188189 msecs"
(time (def w (into (sorted-set) bar)))   ;;   1M: "Elapsed time: 1974.798286 msecs"

(time (def s (clojure.set/intersection
(into (sorted-set) s0)
(into (sorted-set) s1)))) ;; 833K: "Elapsed time: 1589.786106 msecs"

(time (r/fold + + w))                    ;;   1M: "Elapsed time: 167.916539 msecs"
``````

### Testing

Testing is accomplished with the standard `lein test`

``````\$ time lein test

lein test com.dean.interval-tree.interval-map-test

lein test com.dean.interval-tree.interval-set-test

lein test com.dean.interval-tree.interval-test

lein test com.dean.interval-tree.ordered-map-test

lein test com.dean.interval-tree.ordered-set-test

lein test com.dean.interval-tree.tree-test

Ran 30 tests containing 98214 assertions.
0 failures, 0 errors.

real     5m34.487s
user    10m21.397s
sys      0m5.047s
``````

### Modularity

This data structure library is designed around the following concepts of modularity and extensibility.

#### Clojure/Java Interfaces

The top level collections are built on the standard Clojure/Java interfaces, so, for example, working with an `ordered-set` is identical to working with Clojure's `sorted-set`, using all of the same standard collection functions, for the 99% case: meta, nth, seq, rseq, assoc(-in), get(-in), invoke, compare, to-array, empty, .indexOf, .lastIndexof, size, iterator-seq, first, last, =, count, empty, contains, conj. disj, cons, fold, and many old friends will just work, using an efficient implementation that takes full advantage of the capabilities of our underlying tree index.

#### PExtensibleset

An exception to the above is due to the fact that `clojure.set` does not provide interfaces for extensible sets. So, we provide our own intersection, union, difference, subset, and superset. These operators work most efficiently on com.dean.interval-tree collections and provide support for backward interoperability with clojure (or possibly other) set datatypes.

#### Root Container

The individual collection types (ordered-set, ordered-map, interval-set, interval-map} are defined by their individual Class (clojure `deftype`) of top level container that holds the root of an indexed tree. This container describes the behavior of the underlying tree data structure along several architectural dimensions.

##### INodeCollection

The fundamental collection of nodes provides an interface to node allocation machinery and to the root contained node. A variant based on persistent (on-disk) storage, for example, has been built with customizations at this layer.

##### IBalancedCollection

For functional balanced trees, provides an interface to the `stitch` function that returns a new, properly balanced tree containing one newly allocated node adjoined. The provided algorithm is weight balanced however others may be used. We've experimented with red-black trees, in particular, as variants at this layer.

##### IOrderedCollection

Ordered collections define a comparator and predicates to determine the underlying algorithmic compatibility of other collections. Interval Collections are a special type of OrderedCollection.

#### Tree

The heart of the library is our persistent tree.

The code is well documented and explains in more detail the efficiencies of the internal collection operators.

This species of binary tree supports representations of sets, maps, and vectors. In addition to indexed key and range query, it supports the `nth` operation to return nth node from the beginning of the ordered tree, and `node-rank` to return the rank (sequential position) of a given key within the ordered tree, both in logarithmic time.

The axes of exstensibility of the tree implemntation (`*compare*`,`*n-join*`, `*t-join*`) correspond to the interfaces described above.

### Inspiration

This implementation of a weight-balanced binary interval-tree data structure was inspired by the following: 