Liking cljdoc? Tell your friends :D

Queries

Introduction

Crux is a document database that provides you with a comprehensive means of traversing and querying across all of your documents and data without any need to define a schema ahead of time. This is possible because Crux is "schemaless" and automatically indexes the top-level fields in all of your documents to support efficient ad-hoc joins and retrievals. With these capabilities you can quickly build queries that match directly against the relations in your data without worrying too much about the shape of your documents or how that shape might change in future.

Crux is also a graph database. The central characteristic of a graph database is that it can support arbitrary-depth graph queries (recursive traversals) very efficiently by default, without any need for schema-level optimisations. Crux gives you the ability to construct graph queries via a Datalog query language and uses graph-friendly indexes to provide a powerful set of querying capabilities. Additionally, when Crux’s indexes are deployed directly alongside your application you are able to easily blend Datalog and code together to construct highly complex graph algorithms.

Extensible Data Notation (edn) is used as the data format for the public Crux APIs. To gain an understanding of edn see Essential EDN for Crux.

Note that all Crux Datalog queries run using a point-in-time view of the database which means the query capabilities and patterns presented in this section are not aware of valid times or transaction times.

A Datalog query consists of a set of variables and a set of clauses. The result of running a query is a result set of the possible combinations of values that satisfy all of the clauses at the same time. These combinations of values are referred to as "tuples".

The possible values within the result tuples are derived from your database of documents. The documents themselves are represented in the database indexes as "entity–attribute–value" (EAV) facts. For example, a single document {:crux.db/id :myid :color "blue" :age 12} is transformed into two facts [[:myid :color "blue"][:myid :age 12]].

In the most basic case, a Datalog query works by searching for "subgraphs" in the database that match the pattern defined by the clauses. The values within these subgraphs are then returned according to the list of return variables requested in the :find vector within the query.

Basic Structure

A query in Crux is performed by calling crux/q on a Crux database snapshot with a quoted map and, optionally, additional arguments.

link:example$test/crux/docs/examples/query_test.clj[role=include]
1Database value. Usually, the snapshot view comes from calling crux/db on a Crux node
2Query map (vector style queries are also supported)
3Argument(s) supplied to the :in relations

The query map accepts the following Keywords

Table 1. Query Keys
KeyTypePurpose

:find

Vector

Specify values to be returned

:where

Vector

Restrict the results of the query

:in

Vector

Specify external arguments

:order-by

Vector

Control the result order

:limit

Int

Specify how many results to return

:offset

Int

Specify how many results to discard

:rules

Vector

Define powerful statements for use in :where clauses

:timeout

Int

Specify maximum query run time in ms

:full-results?

Boolean

Specify whether to return full documents

Find

The find clause of a query specifies what values to be returned. These will be returned as a list.

Logic Variable

You can directly specify a logic variable from your query. The following will return all last names.

link:example$test/crux/docs/examples/query_test.clj[role=include]

Aggregates

You can specify an aggregate function to apply to at most one logic variable.

Table 2. Built-in Aggregate Functions

Usage

Description

(sum ?lvar)

Accumulates as a single value via the Clojure + function

(min ?lvar)

Return a single value via the Clojure compare function which may operate on many types (integers, strings, collections etc.)

(max ?lvar)

(count ?lvar)

Return a single count of all values including any duplicates

(avg ?lvar)

Return a single value equivalent to sum / count

(median ?lvar)

Return a single value corresponding to the statistical definition

(variance ?lvar)

(stddev ?lvar)

(rand N ?lvar)

Return a vector of exactly N values, where some values may be duplicates if N is larger than the range

(sample N ?lvar)

Return a vector of at-most N distinct values

(distinct ?lvar)

Return a set of distinct values

link:example$test/crux/docs/examples/query_test.clj[role=include]

link:example$test/crux/docs/examples/query_test.clj[role=include]

Note there is always implicit grouping across aggregates due to how Crux performs the aggregation lazily before turning the result tuples into a set.

User-defined aggregates are supported by adding a new method (via Clojure defmethod) for crux.query/aggregate. For example:

(defmethod crux.query/aggregate 'sort-reverse [_]
  (fn
    ([] [])
    ([acc] (vec (reverse (sort acc))))
    ([acc x] (conj acc x))))

EQL Projection

WARNING: ALPHA - subject to change without warning between releases.

Crux queries support a 'projection' syntax, allowing you to decouple specifying which entities you want from what data you’d like about those entities in your queries. Crux’s support is based on the excellent EDN Query Language (EQL) library.

To specify what data you’d like about each entity, include a (eql/project ?logic-var projection-spec) entry in the :find clause of your query:

link:example$test/crux/docs/examples/query_test.clj[role=include]
link:example$test/crux/docs/examples/query_test.clj[role=include]

link:example$test/crux/docs/examples/query_test.clj[role=include]
link:example$test/crux/docs/examples/query_test.clj[role=include]

If you have the entity ID(s) in hand, you can call project or project-many directly:

link:example$test/crux/docs/examples/query_test.clj[role=include]
link:example$test/crux/docs/examples/query_test.clj[role=include]
link:example$test/crux/docs/examples/query_test.clj[role=include]
link:example$test/crux/docs/examples/query_test.clj[role=include]

We can navigate to other entities (and hence build up nested results) using 'joins'. Joins are specified in {} braces in the projection-spec - each one maps one join key to its nested spec:

link:example$test/crux/docs/examples/query_test.clj[role=include]
link:example$test/crux/docs/examples/query_test.clj[role=include]

link:example$test/crux/docs/examples/query_test.clj[role=include]
link:example$test/crux/docs/examples/query_test.clj[role=include]

We can also navigate in the reverse direction, looking for entities that refer to this one, by prepending _ to the attribute name:

link:example$test/crux/docs/examples/query_test.clj[role=include]
link:example$test/crux/docs/examples/query_test.clj[role=include]

You can quickly grab the whole document by specifying * in the projection spec:

link:example$test/crux/docs/examples/query_test.clj[role=include]
link:example$test/crux/docs/examples/query_test.clj[role=include]

Attribute parameters

Crux supports a handful of custom EQL parameters, specified by wrapping the :attribute key in a pair: (:attribute {:param :value, …​}).

  • :as - to rename attributes in the result, wrap the attribute in (:source-attribute {:as :output-name}):

    {:find [(eql/project ?profession [:profession/name
                                      {(:user/_profession {:as :users}) [:user/id :user/name]}])]
     :where [[?profession :profession/name]]}
    
    ;; => [{:profession/name "Doctor",
    ;;      :users [{:user/id 1, :user/name "Ivan"},
    ;;              {:user/id 3, :user/name "Petr"}]},
    ;;     {:profession/name "Lawyer",
    ;;      :users [{:user/id 2, :user/name "Sergei"}]}]
  • :limit - limit the amount of values returned under the given property/join: (:attribute {:limit 5})

  • :default - specify a default value if the matched document doesn’t contain the given attribute: (:attribute {:default "default"})

  • :into - specify the collection to pour the results into: (:attribute {:into #{}})

    {:find [(eql/project ?profession [:profession/name
                                      {(:user/_profession {:as :users, :into #{}})
                                       [:user/id :user/name]}])]
     :where [[?profession :profession/name]]}
    
    ;; => [{:profession/name "Doctor",
    ;;      :users #{{:user/id 1, :user/name "Ivan"},
    ;;               {:user/id 3, :user/name "Petr"}}},
    ;;     {:profession/name "Lawyer",
    ;;      :users #{{:user/id 2, :user/name "Sergei"}}}]
  • :cardinality (reverse joins) - by default, reverse joins put their values in a collection - for many-to-one/one-to-one reverse joins, specify {:cardinality :one} to return a single value.

For full details on what’s supported in the projection-spec, see the EQL specification

Returning maps

To return maps rather than tuples, supply the map keys under :keys for keywords, :syms for symbols, or :strs for strings:

link:example$test/crux/docs/examples/query_test.clj[role=include]
link:example$test/crux/docs/examples/query_test.clj[role=include]

Where

The :where section of a query limits the combinations of possible results by satisfying all clauses and rules in the supplied vector against the database (and any :in relations).

Table 3. Valid Clauses

Name

Description

triple clause

Restrict using EAV indexes

predicate

Restrict with any predicate

range predicate

Restrict with any of < ⇐ >= > =

unification predicate

Unify two distinct logic variables with != or ==

not rule

Negate a list of clauses

not-join rule

Not rule with its own scope

or rule

Restrict on at least one matching clause

or-join rule

Or with its own scope

defined rule

Restrict with a user-defined rule

Triple

A triple clause is a vector of a logic variable, a document key and (optionally) a value to match which can be a literal or another logic variable.

It restricts results by matching EAV facts

link:example$test/crux/docs/examples/query_test.clj[role=include]

link:example$test/crux/docs/examples/query_test.clj[role=include]

link:example$test/crux/docs/examples/query_test.clj[role=include]
1This matches all entities, p, which have a :name field.
2This matches all entities, p, which have a :name of "Ivan".
3This matches all entities, p, which have a :name which match the :last-name of q.

Predicates

Any fully qualified Clojure function that returns a boolean can be used as a "filter" predicate clause.

Predicate clauses must be placed in a clause, i.e. with a surrounding vector.

link:example$test/crux/docs/examples/query_test.clj[role=include]

This matches all entities, p which have an odd :age.

Subqueries

You can nest a subquery with a :where clause to bind the result for further use in the query.

Binding results as a scalar

link:example$test/crux/docs/examples/query_test.clj[role=include]

In the above query, we perform a subquery doing some arithmetic operations and returning the result - and bind the resulting relation as a scalar.

Result set:

link:example$test/crux/docs/examples/query_test.clj[role=include]

Binding results as a tuple

link:example$test/crux/docs/examples/query_test.clj[role=include]

Similar to the previous query, except we bind the resulting relation as a tuple.

Result set:

link:example$test/crux/docs/examples/query_test.clj[role=include]

In this example, we bind the results of a subquery and use them to return another result.

link:example$test/crux/docs/examples/query_test.clj[role=include]

Result set:

link:example$test/crux/docs/examples/query_test.clj[role=include]

Any fully qualified Clojure function can also be used to return relation bindings in this way, by returning a list, set or vector.

Range Predicate

A range predicate is a vector containing a list of a range operator and then two logic variables or literals.

Allowed range operators are <, , >=, >, and =.

link:example$test/crux/docs/examples/query_test.clj[role=include]

link:example$test/crux/docs/examples/query_test.clj[role=include]

link:example$test/crux/docs/examples/query_test.clj[role=include]
1Finds any entity, p, with an :age which is greater than 18
2Finds any entity, p, with an :age which is greater than the :age of any entity
3Finds any entity, p, for which 18 is greater than p’s `:age

Unification Predicate

Use a unification predicate, either == or !=, to constrain two independent logic variables. Literals (and sets of literals) can also be used in place of one of the logic variables.

;; Find all pairs of people with the same age:

[[p :age a]
 [p2 :age a2]
 [(== a a2)]]

;; ...is approximately equivalent to...

[[p :age a]
 [p2 :age a]]

;; Find all pairs of people with different ages:

[[p :age a]
 [p2 :age a2]
 [(!= a a2)]]

;; ...is approximately equivalent to...

[[p :age a]
 [p2 :age a2]
 (not [(= a a2]])]

Not

The not clause rejects a graph if all the clauses within it are true.

link:example$test/crux/docs/examples/query_test.clj[role=include]

link:example$test/crux/docs/examples/query_test.clj[role=include]

link:example$test/crux/docs/examples/query_test.clj[role=include]
1Data
2Query
3Result

This will match any document which does not have a :name of "Ivan" and a :last-name of "Ivanov".

Not Join

The not-join rule allows you to restrict the possibilities for logic variables by asserting that there does not exist a match for a given sequence of clauses.

You declare which logic variables from outside the not-join scope are to be used in the join.

Any other logic variables within the not-join are scoped only for the join.

link:example$test/crux/docs/examples/query_test.clj[role=include]

link:example$test/crux/docs/examples/query_test.clj[role=include]

link:example$test/crux/docs/examples/query_test.clj[role=include]
1Data
2Declaration of which logic variables need to unify with the rest of the query
3Clauses
4Result

This will match any entity, p, which has different values for the :name and :last-name field.

Importantly, the logic variable n is unbound outside the not-join clause.

Or

An or clause is satisfied if any of its legs are satisfied.

link:example$test/crux/docs/examples/query_test.clj[role=include]

link:example$test/crux/docs/examples/query_test.clj[role=include]

link:example$test/crux/docs/examples/query_test.clj[role=include]
1Data
2Query
3Result

This will match any document, p, which has a :last-name of "Ivanov" or "Ivannotov".

When within an or rule, you can use and to group clauses into a single leg (which must all be true).

link:example$test/crux/docs/examples/query_test.clj[role=include]

Or Join

The or-join clause allows you to restrict the possibilities for logic variables by asserting that there does not exist a match for a given sequence of clauses.

You declare which logic variables from outside the or-join scope are to be used in the join.

Any other logic variables within the or-join are scoped only for the join.

link:example$test/crux/docs/examples/query_test.clj[role=include]

link:example$test/crux/docs/examples/query_test.clj[role=include]

link:example$test/crux/docs/examples/query_test.clj[role=include]
1Data
2Declaration of which logic variables need to unify with the rest of the query
3Clauses
4Result

This will match any document, p which has an :age greater than or equal to 18 or has a :name of "Ivan".

Importantly, the logic variable a is unbound outside the or-join clauses.

In

Crux queries can take a set of additional arguments, binding them to variables under the :in key within the query.

:in supports various kinds of binding.

Scalar binding

link:example$test/crux/docs/examples/query_test.clj[role=include]

In the above query, we parameterize the first-name symbol, and pass in "Ivan" as our input, binding "Ivan" to first-name in the query.

Result Set:

link:example$test/crux/docs/examples/query_test.clj[role=include]

Collection binding

link:example$test/crux/docs/examples/query_test.clj[role=include]

This query shows binding to a collection of inputs - in this case, binding first-name to all of the different values in a collection of first-names.

Result Set:

link:example$test/crux/docs/examples/query_test.clj[role=include]

Tuple binding

link:example$test/crux/docs/examples/query_test.clj[role=include]

In this query we are binding a set of variables to a single value each, passing in a collection as our input. In this case, we are passing a collection with a first-name followed by a last-name.

Result Set:

link:example$test/crux/docs/examples/query_test.clj[role=include]

Relation binding

link:example$test/crux/docs/examples/query_test.clj[role=include]

Here we see how we can extend the parameterisation to match using multiple fields at once by passing and destructuring a relation containing multiple tuples.

Result Set:

link:example$test/crux/docs/examples/query_test.clj[role=include]

Ordering and Pagination

A Datalog query naturally returns a result set of tuples, however, the tuples can also be consumed as a sequence and therefore you will always have an implicit order available. Ordinarily this implicit order is not meaningful because the join order and result order are unlikely to correlate.

The :order-by option is available for use in the query map to explicitly control the result order.

link:example$test/crux/docs/examples/query_test.clj[role=include]

Use of :order-by will typically require that results are fully-realised by the query engine, however this happens transparently and it will automatically spill to disk when sorting large numbers results. Ordered results are returned as bags, not sets, so you may wish to deduplicate consecutive identical result tuples (e.g. using clojure.core/dedupe or similar).

Basic :offset and :limit options are supported however typical pagination use-cases will need a more comprehensive approach because :offset will naively scroll through the initial result set each time.

link:example$test/crux/docs/examples/query_test.clj[role=include]

Pagination relies on efficient retrieval of explicitly ordered documents and this may be achieved using a user-defined attribute with values that get sorted in the desired order. You can then use this attribute within your Datalog queries to apply range filters using predicates.

link:example$test/crux/docs/examples/query_test.clj[role=include]

Additionally, since Crux stores documents and can traverse arbitrary keys as document references, you can model the ordering of document IDs with vector values, e.g. {:crux.db/id :zoe :closest-friends [:amy :ben :chris]}

More powerful ordering and pagination features may be provided in the future. Feel free to open an issue or get in touch to discuss your requirements.

Rules

Rules are defined by a rule head and then clauses as you would find in a :where statement.

They can be used as a shorthand for when you would otherwise be repeating the same restrictions in your :where statement.

link:example$test/crux/docs/examples/query_test.clj[role=include]
1Rule usage clause (i.e. invocation)
2Rule head (i.e. signature)
3Rule body containing one or more clauses

The above defines the rule adult? which checks that the supplied entity has an :age which is >= 18

Multiple rule bodies may be defined for a single rule (i.e. with matching rule heads) which works in a similar fashion to an or-join.

The clauses within Rules can also be further Rule invocation clauses. This allows for the recursive traversal of entities.

link:example$test/crux/docs/examples/query_test.clj[role=include]

This example finds all entities that the entity with :name "Smith" is connected to via :follow, even if the connection is via intermediaries.

Timeout

:timeout sets the maximum run time of the query (in milliseconds).

If the query has not completed by this time, a java.util.concurrent.TimeoutException is thrown.

Full Results

Setting the :full-results? flag to true will cause logic variables in the :find clause to return the full document

link:example$test/crux/docs/examples/query_test.clj[role=include]

link:example$test/crux/docs/examples/query_test.clj[role=include]
link:example$test/crux/docs/examples/query_test.clj[role=include]

link:example$test/crux/docs/examples/query_test.clj[role=include]
link:example$test/crux/docs/examples/query_test.clj[role=include]

Valid Time travel

When performing a query, crux/q is called on a database snapshot.

To query based on a different Valid Time, create this snapshot by specifying the desired Valid Time when we call db on the node.

link:example$test/crux/docs/examples/query_test.clj[role=include]

link:example$test/crux/docs/examples/query_test.clj[role=include]

Here, we have put different documents in Crux with different Valid Times.

link:example$test/crux/docs/examples/query_test.clj[role=include]

Here, we have defined a query, q to find all entities with a :name of "Malcolma" and :last-name of "Sparks"

We can run the query at different Valid Times as follows

link:example$test/crux/docs/examples/query_test.clj[role=include]

link:example$test/crux/docs/examples/query_test.clj[role=include]

The first query will return an empty result set (#{}) because there isn’t a document with the :name "Malcolma" valid at #inst "1986-10-23"

The second query will return #{[:malcolm]} because the document with :name "Malcolma" is valid at the current time. This will be the case so long as there are no newer versions (in the valid time axis) of the document that affect the current valid time version.

Joins

Query: "Join across entities on a single attribute"

Given the following documents in the database

link:example$test/crux/docs/examples/query_test.clj[role=include]

We can run a query to return a set of tuples that satisfy the join on the attribute :name

link:example$test/crux/docs/examples/query_test.clj[role=include]

Result Set:

link:example$test/crux/docs/examples/query_test.clj[role=include]

Note that every person joins once, plus 2 more matches.

Query: "Join with two attributes, including a multi-valued attribute"

Given the following documents in the database

link:example$test/crux/docs/examples/query_test.clj[role=include]

We can run a query to return a set of entities that :follows the set of entities with the :name value of "Ivan"

link:example$test/crux/docs/examples/query_test.clj[role=include]

Result Set:

link:example$test/crux/docs/examples/query_test.clj[role=include]

Note that because Crux is schemaless there is no need to have elsewhere declared that the :follows attribute may take a value of edn type set.

Streaming Queries

Query results can also be streamed, particularly for queries whose results may not fit into memory. For these, we use crux.api/open-q, which returns a Closeable sequence. Note that results are returned as bags, not sets, so you may wish to deduplicate consecutive identical result tuples (e.g. using clojure.core/dedupe or similar).

We’d recommend using with-open to ensure that the sequence is closed properly. Additionally, ensure that the sequence (as much of it as you need) is eagerly consumed within the with-open block - attempting to use it outside (either explicitly, or by accidentally returning a lazy sequence from the with-open block) will result in undefined behaviour.

link:example$test/crux/docs/examples/query_test.clj[role=include]

History API

Full Entity History

Crux allows you to retrieve all versions of a given entity:

link:example$test/crux/docs/examples_test.clj[role=include]

Retrieving previous documents

When retrieving the previous versions of an entity, you have the option to additionally return the documents associated with those versions (by using :with-docs? in the additional options map)

link:example$test/crux/docs/examples_test.clj[role=include]

Document History Range

Retrievable entity versions can be bounded by four time coordinates:

  • valid-time-start

  • tx-time-start

  • valid-time-end

  • tx-time-end

All coordinates are inclusive. All coordinates can be null.

link:example$test/crux/docs/examples_test.clj[role=include]

Clojure Tips

Quoting

Logic variables used in queries must always be quoted in the :find and :where clauses, which in the most minimal case could look like the following:

(crux/q db
  {:find ['?e]
   :where [['?e :event/employee-code '?code]]}))

However it is often convenient to quote entire clauses or even the entire query map rather than each individual use of every logic variable, for instance:

(crux/q db
  '{:find [?e]
    :where [[?e :event/employee-code ?code]]}))

Maps and Vectors in data

Say you have a document like so and you want to add it to a Crux db:

{:crux.db/id :me
 :list ["carrots" "peas" "shampoo"]
 :pockets {:left ["lint" "change"]
           :right ["phone"]}}

Crux breaks down vectors into individual components so the query engine is able see all elements on the base level. As a result of this the query engine is not required to traverse any structures or any other types of search algorithm which would slow the query down. The same thing should apply for maps so instead of doing :pocket {:left thing :right thing} you should put them under a namespace, instead structuring the data as :pocket/left thing :pocket/right thing to put the data all on the base level. Like so:

(crux/submit-tx
  node
  [[:crux.tx/put
    {:crux.db/id :me
     :list ["carrots" "peas" "shampoo"]
     :pockets/left ["lint" "change"]
     :pockets/right ["phone"]}]
   [:crux.tx/put
    {:crux.db/id :you
     :list ["carrots" "tomatoes" "wig"]
     :pockets/left ["wallet" "watch"]
     :pockets/right ["spectacles"]}]])

To query inside these vectors the code would be:

(crux/q (crux/db node) '{:find [e l]
                         :where [[e :list l]]
                         :in [l]}
                       "carrots")
;; => #{[:you "carrots"] [:me "carrots"]}

(crux/q (crux/db node) '{:find [e p]
                         :where [[e :pockets/left p]]
                         :in [p]}
                       "watch")
;; => #{[:you "watch"]}

Note that l and p is returned as a single element as Crux decomposes the vector

DataScript Differences

This list is not necessarily exhaustive and is based on the partial re-usage of DataScript’s query test suite within Crux’s query tests.

Crux does not support:

  • vars in the attribute position, such as [e ?a "Ivan"] or [e _ "Ivan"]

Crux does not yet support:

  • ground, get-else, get-some, missing?

  • backref attribute syntax (i.e. [?child :example/_child ?parent])

Note that many advanced query features can be achieved via custom predicate function calls since you can currently reference any fully qualified function that is loaded. In future, limitations on available functions may be introduced to enforce security restrictions for remote query execution.

Test queries from DataScript such as "Rule with branches" and "Mutually recursive rules" work correctly with Crux and demonstrate advanced query patterns. See the Crux query tests for details.

Query tests (advanced)

See the Crux query test file for the full suite of query tests, which showcase many combinations of the query capabilities.

Can you improve this documentation? These fine people already did:
James Henderson, Jeremy Taylor, Daniel Mason, Alistair, Dan Mason & Steven Deobald
Edit on GitHub

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close