link:example$src/docs/examples.clj[role=include]
Crux is a document database that provides you with a comprehensive means of traversing and querying across all of your documents and data without any need to define a schema ahead of time. This is possible because Crux is "schemaless" and automatically indexes the top-level fields in all of your documents to support efficient ad-hoc joins and retrievals. With these capabilities you can quickly build queries that match directly against the relations in your data without worrying too much about the shape of your documents or how that shape might change in future.
Crux is also a graph database. The central characteristic of a graph database is that it can support arbitrary-depth graph queries (recursive traversals) very efficiently by default, without any need for schema-level optimisations. Crux gives you the ability to construct graph queries via a Datalog query language and uses graph-friendly indexes to provide a powerful set of querying capabilities. Additionally, when Crux’s indexes are deployed directly alongside your application you are able to easily blend Datalog and code together to construct highly complex graph algorithms.
This page walks through many of the more interesting queries that run as part of Crux’s default test suite.
See the Crux query test file
for the full suite of query tests and how each test listed below runs in the wider context.
Extensible Data Notation (edn) is used as the data format for the public Crux APIs. To gain an understanding of edn see Essential EDN for Crux.
Note that all Crux Datalog queries run using a point-in-time view of the database which means the query capabilities and patterns presented in this section are not aware of valid times or transaction times.
A Datalog query consists of a set of variables and a set of clauses. The result of running a query is a result set of the possible combinations of values that satisfy all of the clauses at the same time. These combinations of values are referred to as "tuples".
The possible values within the result tuples are derived from your database of
documents. The documents themselves are represented in the database indexes as
"entity–attribute–value" (EAV) facts. For example, a single document
{:crux.db/id :myid :color "blue" :age 12}
is transformed into two facts
[[:myid :color "blue"][:myid :age 12]]
.
In the most basic case, a Datalog query works by searching for "subgraphs" in
the database that match the pattern defined by the clauses. The values within
these subgraphs are then returned according to the list of return variables
requested in the :find
vector within the query.
Our first query runs on a database that contains the following 3 documents which get broken apart and indexed as "entities":
link:example$src/docs/examples.clj[role=include]
Note that :ivan
, :petr
and :smith
are edn keywords, which may be used as
document IDs in addition to UUIDs.
The following query has 3 clauses, represented as edn vectors within the
:where
vector. These clauses constrain the result set to match only the
entity (or subgraph of interconnected entities) that satisfy all 3 clauses at
once:
link:example$src/docs/examples.clj[role=include]
Let’s try to work out what these 3 clauses do…
p1
and n
are logical variables (also referred to as "logic-vars" or simply "lvars"). Logic variables are often prefixed with ?
for clarity but this is optional.
[p1 :name n]
is looking for all entities that have a value under the
attribute of :name
and then binds the corresponding entity ID to p1
and the
corresponding value to n
. Since all 3 entities in our database have a :name
attribute, this clause alone will simply return all 3 entities.
[p1 :last-name n]
reuses the variable n
from the previous clause
which is significant because it constrains the query to only look for entities
where the value of :name
(from the first clause) is equal to the value of
:last-name
(from the second clause). Looking at documents that were processed
by our database there is only one possible entity that can be returned, because
it has the same values :name
and :last-name
.
[p1 :name "Smith"]
only serves to reinforce the conclusion from the previous
two clauses which is that the variable n
can only be matched against the
string "Smith"
within our database.
…so what is the actual result of the query? Well that is defined by the
:find
vector which states that only the values corresponding to p1
should
be returned, which in this case is simply :smith
(the keyword database ID for
the document relating to our protagonist "Smith Smith"). Results are returned
as an edn set, which means duplicate results will not appear.
Passing the query into crux.api/q
(see how to submit a query in 'Get Started'), you get an edn result set containing the value :smith
link:example$src/docs/examples.clj[role=include]
Crux queries can take a set of additional arguments, binding them to variables under the :in
key within the query. Below we have some examples of queries with the various kinds of bindings that :in
supports (ran on the data as entered into the database above).
link:example$src/docs/examples.clj[role=include]
In the above query, we parameterize the first-name
symbol, and pass in "Ivan" as our input, binding "Ivan" to first-name
in the query.
Result Set:
link:example$src/docs/examples.clj[role=include]
link:example$src/docs/examples.clj[role=include]
This query shows binding to a collection of inputs - in this case, binding first-name
to all of the different values in a collection of first-names.
Result Set:
link:example$src/docs/examples.clj[role=include]
link:example$src/docs/examples.clj[role=include]
In this query we are binding a set of variables to a single value each, passing in a collection as our input. In this case, we are passing a collection with a first-name
followed by a last-name
.
Result Set:
link:example$src/docs/examples.clj[role=include]
link:example$src/docs/examples.clj[role=include]
Here we see how we can extend the parameterisation to match using multiple fields at once by passing and destructuring a relation containing multiple tuples.
Result Set:
link:example$src/docs/examples.clj[role=include]
Something else we can do with arguments is apply predicates to them directly within the clauses. Predicates return either true
or false
but all predicates used in clauses must return true
in order for the given combination of field values to be part of the valid result set:
link:example$src/docs/examples.clj[role=include]
In this case, we have a list of ages as our input, and only 22
satisfies the predicate, (> age 21)
.
Result set:
link:example$src/docs/examples.clj[role=include]
Any fully qualified Clojure function that returns a boolean can be used in place of >
, for example:
link:example$src/docs/examples.clj[role=include]
Result set:
link:example$src/docs/examples.clj[role=include]
link:example$src/docs/examples.clj[role=include]
link:example$src/docs/examples.clj[role=include]
The following built-in aggregates are provided as additional syntax for use within the :find
clause and each accept at most one logical variable (e.g. ?lvar
in these examples):
(sum ?lvar)
accumulates as single value via the Clojure +
function
(min ?lvar)
and (max ?lvar)
return a single value via the Clojure compare
function which may operates on many types (integers, strings, collections etc.)
(min N ?lvar)
and (max N ?lvar)
returns a sorted set of the N minimum/maximum items. N must be a positive integer and cannot be referenced via an additional logic-var.
(count ?lvar)
returns a single count of all values including any duplicates
(count-distinct ?lvar)
returns a single count of all unique values
(avg ?lvar) returns a single value equivalent to `sum / count
(median ?lvar)
, (variance ?lvar)
and (stddev ?lvar)
return single values corresponding to the statistical definitions
(rand N ?lvar)
returns a vector of exactly N values, where some values may be duplicates if N is larger than the range
(sample N ?lvar)
returns a vector of at-most N distinct values
(distinct ?lvar)
returns a set of distinct values
Note there is always implicit grouping across aggregates due to how Crux performs the aggregation lazily before turning the result tuples into a set.
User-defined aggregates are supported by adding a new method (via Clojure defmethod
) for crux.query/aggregate
. For example:
(defmethod crux.query/aggregate 'sort-reverse [_]
(fn
([] [])
([acc] (vec (reverse (sort acc))))
([acc x] (conj acc x))))
Congratulations! You already know enough about queries to build a simple CRUD application with Crux. However, your manager has just told you that the new CRUD application you have been designing needs to backfill the historical document versions from the legacy CRUD application. Luckily Crux makes it easy for your application to both insert and retrieve these old versions.
Here we will see how you are able to run queries at a given point in the valid time axis against, implicitly, the most recent transaction time.
First, we transact a very old document into the database with the ID :malcolm
and the :name
"Malcolm"
, and specify the valid time
instant at which this document became valid in the legacy system: #inst "1986-10-22"
.
link:example$src/docs/examples.clj[role=include]
Next we transact a slightly more recent (though still very old!) revision of that same document where the :name
has been corrected to "Malcolma"
, again using a historical timestamp extracted from the legacy system.
link:example$src/docs/examples.clj[role=include]
We are then able to query at different points in the valid time axis to check for the validity of the correction. We define a query q
:
link:example$src/docs/examples.clj[role=include]
Firstly we can verify that "Malcolma"
was unknown at #inst "1986-10-23"
.
link:example$src/docs/examples.clj[role=include]
Result Set:
link:example$src/docs/examples.clj[role=include]
We can then verify that "Malcolma"
is the currently known :name
for the entity
with ID :malcolm
by simply not specifying a valid time alongside the query.
This will be the case so long as there are no newer versions (in the valid time axis)
of the document that affect the current valid time version.
link:example$src/docs/examples.clj[role=include]
Result Set:
link:example$src/docs/examples.clj[role=include]
Given the following documents in the database
link:example$src/docs/examples.clj[role=include]
We can run a query to return a set of tuples that satisfy the join on the attribute :name
link:example$src/docs/examples.clj[role=include]
Result Set:
link:example$src/docs/examples.clj[role=include]
Note that every person joins once, plus 2 more matches.
Given the following documents in the database
link:example$src/docs/examples.clj[role=include]
We can run a query to return a set of entities that :follows
the set of entities with the :name
value of "Ivan"
link:example$src/docs/examples.clj[role=include]
Result Set:
link:example$src/docs/examples.clj[role=include]
Note that because Crux is schemaless there is no need to have elsewhere declared that the :follows
attribute may take a value of edn type set
.
Within Crux queries, you can nest a subquery - and bind the results for further use in the query.
link:example$src/docs/examples.clj[role=include]
In the above query, we perform a subquery doing some arithmetic operations and returning the result - and bind the resulting relation as a scalar.
Result set:
link:example$src/docs/examples.clj[role=include]
link:example$src/docs/examples.clj[role=include]
Similar to the previous query, except we bind the resulting relation as a tuple.
Result set:
link:example$src/docs/examples.clj[role=include]
In this example, we bind the results of a subquery and use them to return another result.
link:example$src/docs/examples.clj[role=include]
Result set:
link:example$src/docs/examples.clj[role=include]
ALPHA - subject to change without warning between releases.
Crux queries support a 'projection' syntax, allowing you to decouple specifying which entities you want from what data you’d like about those entities in your queries. Crux’s support is based on the excellent EDN Query Language (EQL) library.
To specify what data you’d like about each entity, include a (eql/project ?logic-var projection-spec)
entry in the :find
clause of your query:
link:example$src/docs/examples.clj[role=include]
link:example$src/docs/examples.clj[role=include]
link:example$src/docs/examples.clj[role=include]
link:example$src/docs/examples.clj[role=include]
We can navigate to other entities (and hence build up nested results) using 'joins'.
Joins are specified in {}
braces in the projection-spec - each one maps one join key to its nested spec:
link:example$src/docs/examples.clj[role=include]
link:example$src/docs/examples.clj[role=include]
link:example$src/docs/examples.clj[role=include]
link:example$src/docs/examples.clj[role=include]
We can also navigate in the reverse direction, looking for entities that refer to this one, by prepending _
to the attribute name:
link:example$src/docs/examples.clj[role=include]
link:example$src/docs/examples.clj[role=include]
To rename attributes in the result, wrap the attribute in (:source-attribute {:as :output-name})
:
{:find [(eql/project ?profession [:profession/name {(:user/_profession {:as :users}) [:user/id :user/name]}])]
:where [[?profession :profession/name]]}
;; => [{:profession/name "Doctor",
;; :users [{:user/id 1, :user/name "Ivan"},
;; {:user/id 3, :user/name "Petr"}]},
;; {:profession/name "Lawyer",
;; :users [{:user/id 2, :user/name "Sergei"}]}]
You can quickly grab the whole document by specifying *
in the projection spec:
link:example$src/docs/examples.clj[role=include]
link:example$src/docs/examples.clj[role=include]
For full details on what’s supported in the projection-spec, see the EQL specification
A Datalog query naturally returns a result set of tuples, however, the tuples can also be consumed as a sequence and therefore you will always have an implicit order available. Ordinarily this implicit order is not meaningful because the join order and result order are unlikely to correlate.
The :order-by
option is available for use in the query map to explicitly
control the result order.
link:example$src/docs/examples.clj[role=include]
Use of :order-by
will typically require that results are fully-realised by
the query engine, however this happens transparently and it will automatically
spill to disk when sorting large result sets.
Basic :offset
and :limit
options are supported however typical pagination
use-cases will need a more comprehensive approach because :offset
will
naively scroll through the initial result set each time.
link:example$src/docs/examples.clj[role=include]
Pagination relies on efficient retrieval of explicitly ordered documents and this may be achieved using a user-defined attribute with values that get sorted in the desired order. You can then use this attribute within your Datalog queries to apply range filters using predicates.
link:example$src/docs/examples.clj[role=include]
Additionally, since Crux stores documents and can traverse arbitrary keys as
document references, you can model the ordering of document IDs with vector
values, e.g. {:crux.db/id :zoe :closest-friends [:amy :ben :chris]}
More powerful ordering and pagination features may be provided in the future. Feel free to open an issue or get in touch to discuss your requirements.
This example of a rule demonstrates a recursive traversal of entities that are
connected to a given entity via the :follow
attribute.
link:example$src/docs/examples.clj[role=include]
Query results can also be streamed, particularly for queries whose results may
not fit into memory. For these, we use crux.api/open-q
, which returns a
Closeable
sequence.
We’d recommend using with-open
to ensure that the sequence is closed properly.
Additionally, ensure that the sequence (as much of it as you need) is eagerly
consumed within the with-open
block - attempting to use it outside (either
explicitly, or by accidentally returning a lazy sequence from the with-open
block) will result in undefined behaviour.
link:example$src/docs/examples.clj[role=include]
Crux allows you to retrieve all versions of a given entity:
link:example$src/docs/examples.clj[role=include]
When retrieving the previous versions of an entity, you have the option to additionally return the documents associated with those versions (by using :with-docs?
in the additional options map)
link:example$src/docs/examples.clj[role=include]
Retrievable entity versions can be bounded by four time coordinates:
valid-time-start
tx-time-start
valid-time-end
tx-time-end
All coordinates are inclusive. All coordinates can be null.
link:example$src/docs/examples.clj[role=include]
Logic variables used in queries must always be quoted in the :find
and
:where
clauses, which in the most minimal case could look like the following:
(crux/q db
{:find ['?e]
:where [['?e :event/employee-code '?code]]}))
However it is often convenient to quote entire clauses or even the entire query map rather than each individual use of every logical variable, for instance:
(crux/q db
'{:find [?e]
:where [[?e :event/employee-code ?code]]}))
Confusion may arise when you later attempt to introduce references to Clojure
variables within your query map, such as when using :args
. This can be
resolved by introducing more granular quoting for specific parts of the query
map:
(let [my-code 101214]
(crux/q db
{:find '[?e]
:where '[[?e :event/employee-code ?code]]
:args [{'?code my-code}]}))
Say you have a document like so and you want to add it to a Crux db:
{:crux.db/id :me
:list ["carrots" "peas" "shampoo"]
:pockets {:left ["lint" "change"]
:right ["phone"]}}
Crux breaks down vectors into individual components so the query engine is able
see all elements on the base level. As a result of this the query engine is not
required to traverse any structures or any other types of search algorithm
which would slow the query down. The same thing should apply for maps so
instead of doing :pocket {:left thing :right thing}
you should put them under
a namespace, instead structuring the data as :pocket/left thing :pocket/right
thing
to put the data all on the base level. Like so:
(crux/submit-tx
node
[[:crux.tx/put
{:crux.db/id :me
:list ["carrots" "peas" "shampoo"]
:pockets/left ["lint" "change"]
:pockets/right ["phone"]}]
[:crux.tx/put
{:crux.db/id :you
:list ["carrots" "tomatoes" "wig"]
:pockets/left ["wallet" "watch"]
:pockets/right ["spectacles"]}]])
To query inside these vectors the code would be:
(crux/q (crux/db node) '{:find [e l]
:where [[e :list l]]
:args [{l "carrots"}]})
;; => #{[:you "carrots"] [:me "carrots"]}
(crux/q (crux/db node) '{:find [e p]
:where [[e :pockets/left p]]
:args [{p "watch"}]})
;; => #{[:you "watch"]}
Note that l
and p
is returned as a single element as Crux decomposes the
vector
This list is not necessarily exhaustive and is based on the partial re-usage of DataScript’s query test suite within Crux’s query tests.
Crux does not support:
vars in the attribute position, such as [e ?a "Ivan"]
or [e _ "Ivan"]
Crux does not yet support:
ground
, get-else
, get-some
, missing?
backref attribute syntax (i.e. [?child :example/_child ?parent]
)
Note that many advanced query features can be achieved via custom predicate function calls since you can currently reference any fully qualified function that is loaded. In future, limitations on available functions may be introduced to enforce security restrictions for remote query execution.
Test queries from DataScript such as "Rule with branches" and "Mutually recursive rules" work correctly with Crux and demonstrate advanced query patterns. See the Crux query tests for details.
Can you improve this documentation? These fine people already did:
James Henderson, Daniel Mason, Dan Mason, Steven Deobald & Jeremy TaylorEdit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close