idx
lets you treat your collection like a database.
This library provides clojure(script) data structures (maps, sets and vectors) that allow for secondary indexes. The indexes offer alternative fast access paths to their elements.
Supports both Clojure and ClojureScript.
(require '[com.wotbrew.idx :as idx])
(def indexed-vector (idx/auto [{:name "fred"} {:name "ethel"}]))
indexed-vector
;; =>
[{:name "fred"} {:name "ethel"}] ; the indexed structure will behave (and be printed!) like the original collection.
(idx/lookup indexed-vector :name "ethel")
;; =>
({:name "ethel"})
lein [com.wotbrew/idx "0.1.3"]
or deps com.wotbrew/idx {:mvn/version "0.1.3"}
.
conj
, assoc
and so on.It is common to have the problem of taking repeated linear lookups (as you might accomplish with filter
) to a sub-linear one. Typically what happens
is you use group-by
or write some code to transform your collection into some kind of map to allow for fast lookup of its elements.
There are 3 problems that idx
tries to solve:
-by-this
or -by-that
type locals (or worse args, or keys) that only serve as fast paths to your actual data.Lets say we have a large collection of orders, and a large collection of items, I want to join the groups of items for each order under the :items key.
Typical solution would look like this:
(let [item-idx (group-by :order-id items)]
(for [order orders]
(assoc order :items (item-idx (:order-id order)))))
Now this case is not too egregious, however in real code it is tempting to pass your indexes between functions, introducing accidental complexity to their signatures (the args would not be there if it were not for insufficient data structures). If you do not do that - you are often loosing gains by not sharing them as you repeatedly construct expensive indexes.
Furthermore we often have to structure our code around the indexes, they are not easy to add and remove independent of the usages.
If you are manually creating indexes and have to change your data, then you have to do that yourself.
There are many solutions in this space, but I have not seen one that does not convert your data into something else. You can look at this library as a drop-in you can use to supercharge your existing collections.
idx
is not trying to compete with databases like datascript. It is intending to compete with a proliferation of manual group-by
, index-by
style calls in order
to find data in your collections.
The performance of course is not as good as manual indexing, but should be good enough for most of the same use cases.
All functions are available in the com.wotbrew.idx
namespace.
; (index coll p kind ...)
;; e.g
(def coll [{:id 1, :name "fred", :created-at #inst "2018-03-14"} ...])
(index coll :id :idx/unique :name :idx/hash :created-at :idx/sort)
;=>
[{:id 1, :name "fred", :created-at #inst "2018-03-14"} ...]
index
returns a new indexed collection with the specified indexes (plus any that already existed).
You can also later remove indexes with delete-index
.
Kind is either:
:idx/hash
for fast lookup
calls:idx/unique
for fast identify
/ replace-by
calls.:idx/sort
for fast ascending
/ descending
calls.The resulting indexing collection meets the interface of its (map/vector/set) input, if you add/associate/remove elements the indexes will be maintained on your behalf.
(auto coll)
This returns an indexed collection that creates indexes (and caches them) on demand as you query.
This can be good in local contexts where you only want to specify indexes once where they are used.
Caveats to this approach:
conj
, assoc
, dissoc
and so on.delete-index
.You can still call index
on auto
collections if you want to force certain indexes ahead of time.
Once wrapped with index
or auto
, a small suite of functions is available to query your collection.
lookup
Uses a one-to-many hash index if available.
Returns sequences of (unsorted) elements. You pass a property to test and value to match.
Say you have a vector of users, each with an age key, you might do:
(lookup users :age 10)
Say you have a vector of numbers, and you want to find the negative ones, functions are properties.
(def numbers [-1,3,4,-5])
(lookup numbers neg? true)
;; =>
(-5, -1)
identify
Uses a one-to-one hash index if available.
identify
operates on unique properties and predicates. It will throw
if the property is non unique. Good for by-id type queries.
(def users [{:id 32344, :name "Fred"} ...])
(identify users :id 32344)
;; =>
{:id 32344, :name "Fred"}
ascending
, descending
Uses a one-to-many sorted index if available.
(def users [{:name "Alice", :age 42}
{:name "Bob", :age 30}
{:name "Barbara", :age 12}
{:name "Jim", :age 83}])
(ascending users :age > 30) ; ascending sort
;; =>
({:name "Alice", :age 42}, {:name "Jim", :age 83})
(descending users :age <= 30) ; descending sort
;; =>
({:name "Bob", :age 30}, {:name "Barbara", :age 12})
path
path
allows nested indexes, use it for nested indexes as if you
are indexing a get-in call.
(lookup orders (path :user :id) 32344)
match
match
composes properties to form composite indexes. match
returns a Predicate implementation so
you do not have to redundantly respecify the structure in the value position.
(lookup numbers (match neg? true, even? true))
They keys can be any property, and match nests.
This allows for some pretty extensive (and expensive!) indexes, but is useful to compose a couple of properties together for composite keys.
(lookup orders (match (path :user :id) 32444,
:carrier (match :country "uk" :available true)
:delivery-date #"2020-08-17"))
In order to index a match, either use an auto
coll or use the :idx/value
placeholder.
(index coll (match (path :user :id) :idx/value,
:carrier (match :country :idx/value :available :idx/value)
:delivery-date :idx/value))
pred
pred
creates a predicate that uses a truthy/falsey index. It can be used in the value position of matches.
(match :foo (pred :has-bar))
pred is also useful to promote a function so it can be used
in the predicate position of lookup
/ identify
/ replace-by
.
(lookup :foo (pred even?))
pk
pk
looks up the (primary) index or key of the matched element.
Uses a unique one-to-one hash index if one is available.
(pk [1,4,5,3] identity 5) ;; => 2
(pk {:foo 42, :bar 33} identity 33) ;; => bar
(pk [{:foo 42}, {:foo 33}] :foo 42) ;; => 0
pcomp
pcomp
allows for function style composition of properties, takes 2 properties and returns a property.
The property returned by (pcomp a b)
Will be looked each property in turn. (a (b element))
Any indexes (automatic or otherwise) against your collection will be maintained incrementally with new versions of the data structure.
Use normal clojure collection functions such as conj
, assoc
, dissoc
, disj
and into
.
Some handy functions are enabled due to the presence of indexes.
replace-by
Replaces an element in the collection identified by the property/value or predicate.
Uses a unique index if one is available (always true if you use auto)
e.g
(replace-by customers :email "foo@bar.com" new-customer)
unwrap
As maintaining indexes can become expensive if you want to make a lot of modifications, you can completely remove any indexing
with unwrap
.
(unwrap coll)
delete-index
Returns a new collection without the specified index(es), uses same index specification as index
(delete-index coll :foo :idx/hash)
idx
indexes elements against 'properties', which are objects implementing the Property protocol. By default
an object used as a property is looked up as a key in the element. Which works well for the common use case of querying by key.
Functions implement Property, they are not looked up as keys, but rather applied to the element.
There are a couple of useful property combinators match and pcomp.
(path prop1 prop2 ...)
.(get element o)
(get element o)
with (as-key o)
Several functions take predicates as arguments, a predicate composes a property with its expected value. This is how match
works.
You can lift any function into a truthy/falsey test with pred.
All taken on a 2015 MBP using bench
from criterium.
The performance goal is to get close enough to manual indexing speed using clojure data structures that it is a viable solution for most normal line-of-business code.
Creating a :idx/hash index, indexing #(mod % 10) over 10000 integers.
Execution time mean : 4.079791 ms
For comparison group-by
Execution time mean : 1.973795 ms
Creating a :idx/unique value index on 10000 integers.
Execution time mean : 1.314164 ms
Doing the same manually (persistent! (reduce-kv #(assoc! %1 %3 %2) (transient {}) sample))
Execution time mean : 1.450698 ms
Which is unsurprising as the code does identical work.
Creating a :idx/sort index is the most expensive, as shown in this benchmark for the same collection and property.
Execution time mean : 5.673653 ms
Lookups are relatively cheap in all cases when indexes exist. When a scan is necessary performance is close or the same as to (filter).
Lookups are somewhat slower than against equivalent maps, mostly because this library needs to contend with incremental index maintenance as collections change.
Looking up against a :idx/hash index.
Execution time mean : 118.559471 ns
For comparison get
against the equivalent (group-by) map, literally just a map lookup.
Execution time mean : 43.276000 ns
Looking up against a :idx/unique index with identify.
Execution time mean : 127.409999 ns
For comparison getting from the equiv map:
Execution time mean : 51.197166 ns
Modification of indexed collections is difficult to benchmark, it depends on the number (and kind) of indexes, the properties you are testing (if functions).
It will almost always be the case that maintaining an index is more expensive than maintaining the original collection. In the case of vectors, the difference is stark, as vector writes are much cheaper than the map writes that indexes require.
Conj onto 10000 integer vector with a unique value index
Execution time mean : 307.481521 ns
For comparison, conj + manual indexing in an equivalent value to element map.
Execution time mean : 188.853768 ns
Copyright © 2020 Daniel Stone
Distributed under the MIT license.
Can you improve this documentation?Edit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close