There are a number of answers to that question but the easiest way to getting started is to run the Clojure REPL, obtain a connection to the in-memory database, transact your schema and data, and then start running queries against that data.
If you are not familiar with the concept of a REPL, it is simply a Read–eval–print loop.
From the terminal, run:
brew install leiningen
Leiningen is the easiest way to start using Clojure. From the directory you checked out the Eva repository, launch the REPL with:
lein repl
Once you are inside the REPL we will define a variable which will obtain a connection to the in-memory database.
(def conn (eva/connect {:local true}))
Basically we've defined a variable conn
whose value is the output of the call to the connect function in the eva
namespace with the argument {:local true}
. {:local true}
allows us to create an in-memory database.
Similarly to a traditional RDBMS, Eva has a schema used to describe a set of attributes that can be associated with entities. Let us start with something simple.
(def schema [
{:db/id (eva/tempid :db.part/db)
:db/ident :book/title
:db/doc "Title of a book"
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db.install/_attribute :db.part/db}
{:db/id (eva/tempid :db.part/db)
:db/ident :book/year_published
:db/doc "Date book was published"
:db/valueType :db.type/long
:db/cardinality :db.cardinality/one
:db.install/_attribute :db.part/db}
{:db/id (eva/tempid :db.part/db)
:db/ident :book/author
:db/doc "Author of a book"
:db/valueType :db.type/ref
:db/cardinality :db.cardinality/one
:db.install/_attribute :db.part/db}
{:db/id (eva/tempid :db.part/db)
:db/ident :author/name
:db/doc "Name of author"
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db.install/_attribute :db.part/db}
])
For those of you familiar with schema-based databases you probably get the gist of what we are trying to do here. That is, define a schema containing books and authors.
:db/id
is a keyword used to reference the entity-id for the data we are about to transact.
eva/tempid
returns a temporary id used to reference this datom within the context of the transaction. Tempids are resolved to permanent unique id[entifiers] upon completion of the transaction.
:db.part/db
represents the partition name and is passed to the tempid function.
:db/ident
is used to give the attribute a name that we can use to easily reference it later.
:db/doc
allows us to define a docstring for this particular attribute.
:db/valueType
refers to the type of data used by this entity. A full list of supported types can be found here. Worth calling out is the :db.type/ref
type, which allows you to reference other entities.
:db.install/_attribute
simply states we want to install this entity into the db.part/db
partition.
Now that we've defined our schema we need to transact it into the database.
Transact allows you to make assertions and retraction where assertions are analogous to a traditional insert/update, and retractions are analogous to a traditional delete.
Remember that the variable conn
contains our connection to the database. With the database schema stored in the schema
variable, transacting it into the database is as simple as:
@(eva/transact conn schema)
Note: The transaction is asynchronous, returning a future. The @
dereferences this future, blocking until there is a value to return.
The response of a successful transact
is a map with four keys, :db-before
, :db-after
, :tempids
, and :tx-data
. :db-before
and :db-after
contains snapshots of the database before and after the transaction. :tempids
contains a mapping of temporary ids that occurred as part of the transaction to their corresponding permanent ids. :tx-data
contains the individual #datom
vectors that were inserted into the database.
Alright, now let's add some data! First we define a variable called first-book
:
(def first-book [[:db/add (eva/tempid :db.part/user) :book/title "First Book"]])
This variable will transact a single fact, which we call a datom, into the database. All datoms are made up of the 5-tuple, [eid attr val tx added?]
. In this case, eid
corresponds to (eva/tempid :db.part/user)
, attr
-> :book/title
, and val
-> "First Book"
. tx
and added?
are filled in for us implicitly. tx
refers to the id of the transaction entity that is created as part of every successful transaction and added?
is simply a boolean value indicating whether this fact was added or retracted. The datom is the smallest unit of data that can be manipulated in the database.
:db/add
is the keyword used to indicate an upsertion (insert/update) of data. Unlike our schema, :db/id
is not required here as it is implicit when adding data in list form (more on that later). Instead of the db
partition, which we used for our schema, we now use the :db.part/user
partition.
Transact this data with:
@(eva/transact conn first-book)
We have transacted some data, now let's write a basic query to go and get it.
(def get-book '[:find ?b
:where [?b :book/title "First Book"]])
So we defined a variable called get-book
, and you are probably wondering what that '
means. Wrapping a form with a single quote will return the form without evaluating it as code. Try defining the variable without the '
and see what happens.
Next is the query, [:find ?b :where [?b :book/title "First Book"]]
. There is a lot to take in here so we'll break it down piece-by-piece. First of all, every query you write needs to be wrapped in a vector ([...]
). The query starts with the :find
keyword followed by a number of logic variables (lvar for short) denoted with a ?
. The :where
clause follows and, similarly to SQL, is used to restrict the query results.
The tuple [?b :book/title "First Book"]
is called a data pattern. All querying is essentially matching that pattern to the datom 5-tuple we discussed earlier ([eid attr val tx added?]
). In this case we are asking for all of the entity ids (?b
) which have the attribute :book/title
with value "First Book"
. What about tx
and added?
, why don't they appear in the clause? Simply, if not present they are replaced with implicit blanks. Expanding the tuple to its full form would yield, [?b :book/title "First Book" _ _]
. We'll talk more about blanks later.
Before we can run this query we need to obtain a database snapshot which can be accomplished with:
(def db (eva/db conn))
Passing our connection (conn
) as the argument to eva/db
we get back the current database snapshot. This value represents the state of the database at a given point in time and, given the same query, will always return the same result.
Once we have that we can run our query with:
(eva/q get-book db)
We should see an entity-id as the result.
Run each of the following statements individually in the REPL.
(def dataset
[{:db/id (eva/tempid :db.part/user -1) :author/name "Martin Kleppman"}
{:db/id (eva/tempid :db.part/user -2) :book/title "Designing Data-Intensive Applications" :book/year_published 2017 :book/author (eva/tempid :db.part/user -1)}
{:db/id (eva/tempid :db.part/user -3) :author/name "Aurelien Geron"}
{:db/id (eva/tempid :db.part/user -4) :book/title "Hands-On Machine Learning" :book/year_published 2017 :book/author (eva/tempid :db.part/user -3)}
{:db/id (eva/tempid :db.part/user -5) :author/name "Wil van der Aalst"}
{:db/id (eva/tempid :db.part/user -6) :book/title "Process Mining: Data Science in Action" :book/year_published 2016 :book/author (eva/tempid :db.part/user -5)}
{:db/id (eva/tempid :db.part/user -7) :book/title "Modeling Business Processes: A Petri-Net Oriented Approach" :book/year_published 2011 :book/author (eva/tempid :db.part/user -5)}
{:db/id (eva/tempid :db.part/user -8) :author/name "Edward Tufte"}
{:db/id (eva/tempid :db.part/user -9) :book/title "The Visual Display of Quantitative Information" :book/year_published 2001 :book/author (eva/tempid :db.part/user -8)}
{:db/id (eva/tempid :db.part/user -10) :book/title "Envisioning Information" :book/year_published 1990 :book/author (eva/tempid :db.part/user -8)}
{:db/id (eva/tempid :db.part/user -11) :author/name "Ramez Elmasri"}
{:db/id (eva/tempid :db.part/user -12) :book/title "Operating Systems: A Spiral Approach" :book/year_published 2009 :book/author (eva/tempid :db.part/user -11)}
{:db/id (eva/tempid :db.part/user -13) :book/title "Fundamentals of Database Systems" :book/year_published 2006 :book/author (eva/tempid :db.part/user -11)}])
(def dataset2
[{:db/id (eva/tempid :db.part/user -14) :author/name "Steve McConnell"}
{:db/id (eva/tempid :db.part/user -15) :book/title "Code Complete: A Practical Handbook of Software Construction" :book/year_published 2004 :book/author (eva/tempid :db.part/user -14)}
{:db/id (eva/tempid :db.part/user -16) :book/title "Software Estimation: Demystifying the Black Art" :book/year_published 2006 :book/author (eva/tempid :db.part/user -14)}
{:db/id (eva/tempid :db.part/user -17) :book/title "Rapid Development: Taming Wild Software Schedules" :book/year_published 1996 :book/author (eva/tempid :db.part/user -14)}
{:db/id (eva/tempid :db.part/user -18) :book/title "Software Project Survival Guide" :book/year_published 1997 :book/author (eva/tempid :db.part/user -14)}
{:db/id (eva/tempid :db.part/user -19) :book/title "After the Gold Rush: Creating a True Profession of Software Engineering" :book/year_published 1999 :book/author (eva/tempid :db.part/user -14)}
{:db/id (eva/tempid :db.part/user -20) :author/name "Don Miguel Ruiz"}
{:db/id (eva/tempid :db.part/user -21) :book/title "The Four Agreements: A Practical Guide to Personal Freedom" :book/year_published 2011 :book/author (eva/tempid :db.part/user -20)}
{:db/id (eva/tempid :db.part/user -22) :author/name "Charles Petzold"}
{:db/id (eva/tempid :db.part/user -23) :book/title "Code: The Hidden Language of Computer Hardware and Software"
:book/year_published 2000 :book/author (eva/tempid :db.part/user -22)}
{:db/id (eva/tempid :db.part/user -24) :author/name "Anil Maheshwari"}
{:db/id (eva/tempid :db.part/user -25) :book/title "Data Analytics Made Accessible" :book/year_published 2014 :book/author (eva/tempid :db.part/user -24)}
{:db/id (eva/tempid :db.part/user -26) :author/name "Jeremy Anderson"}
{:db/id (eva/tempid :db.part/user -27) :book/title "Professional Clojure" :book/year_published 2016 :book/author (eva/tempid :db.part/user -26)}])
In the above, we define two vectors (dataset
and dataset2
) of maps containing data that we will transact into the database. One thing you might notice is that we omitted the :db/add
keyword entirely. When you are transacting a single piece of data in a list, as we did above, you are limited to adding or retracting a specific fact about an entity. When transacting data in a map, the :db/add
is implied and you can include any number of attribute/value pairs. Obviously the map form method is preferable when transacting larger amounts of data.
Another thing worth mentioning is that we are now passing a negative number as the second argument to all of our eva/tempid
calls. Given the same partition and negative number, each invocation will return the same temporary id. The reason we do this is so that we can use that tempid to add references between entities in a single transaction. In this example we are using the tempid for the author entity as the reference value for the :book/author
attribute.
Transact this with:
@(eva/transact conn dataset)
@(eva/transact conn dataset2)
Note: The only reason we split this into dataset
and dataset2
is due to a limitation in the REPL where large pastes result in odd failures.
First we are going to demonstrate how we can parameterize our original query (def get-book '[:find ?b :where [?b :book/title "First Book"]])
so we can pass the title of the book we want to find into the query.
Modify our previous data structure with:
(def get-book '[:find ?b
:in $ ?t
:where [?b :book/title ?t]])
There are two things to note here. Firstly we've introduced the :in
keyword, which provides the query with input parameters. The $
is used specifically to denote the db (passed in implicity if the :in
clause is omitted). We've also added a second logic variable to replace the hard-coded book title. Now we can use this query to find the entity id of the book titled "Envisioning Information"
.
(eva/q get-book db "Envisioning Information")
Wait a second, that should have returned a result. Remember, db
is an immutable value of the database. We modified the database since the last time we got that db
value, so we need to get it again with:
(def db (eva/db conn))
Run the query again and we should get the result we desire. As much fun as entity ids are, there is a lot more we can do with the query engine so let's keep going.
(eva/q '[:find ?title
:where
[?b :book/year_published 2017]
[?b :book/title ?title]]
db)
In this example, we use a logic variable ?title
to find the title of books released in 1990. One interesting thing to note about this query is that ?b
is used twice. When a logic variable is used more than once it must represent the same entity in every clause in order to satisfy the set of clauses. This is also referred to as unification. In SQL, this is roughly equivalent to joining the years and titles on the shared entity ids.
(eva/q '[:find ?name
:where
[?b :book/title "Designing Data-Intensive Applications"]
[?b :book/author ?a]
[?a :author/name ?name]] db)
In this query we are following a relationship from a book to its author, and then finding and returning the name of the author. We bind the logic variable ?a
to the entity id of the author associated with the book. Then we use that entity id to get the name of the actual author.
We can just as easily reverse this query and get all of the books published by a certain author.
(eva/q '[:find ?books
:where
[?b :book/title ?books]
[?b :book/author ?a]
[?a :author/name "Steve McConnell"]] db)
Up to this point we have been querying data by unifying individual values after our :find
clause to those in our :where
clause. Using the Pull API, we can make the following call (replace eid
with the result of our very first query (eva/q get-book db "Envisioning Information")
.
(eva/pull db '[*] <eid>)
The Pull API expects a db
as its first argument, similar to how we pass db
to our queries. The second argument [*]
is a pattern, where you can specify which attributes you would like to have returned. Using *
as your pattern indicates you want all of the attributes for this particular entity. The final argument is the id of the entity you are trying to get.
(eva/q '[:find ?name
:where
[_ :book/title ?name]] db)
Instead of using a logic variable to bind the id of a book entity we simply use _
. The underscore is equivalent to a wildcard and will match anything, so every entity id for this attribute will be returned by the query.
As we mentioned before, the use of the :in
clause allows us to pass inputs to our queries. A number of data types are supported, such as:
:in $ [?a ?b]
where you would pass a single tuple of logic variables to a query.
:in $ [?a ...]
where you would pass a collection of inputs (ie: ["Book1" "Book2" "BookN"]) to the query.
:in $ [[?a ?b]]
where you would pass a set of tuples to the query.
Whenever you transact data into Eva, a transaction entity is created as well. The transaction entity contains the instant (timestamp) a transaction is committed, which is useful to know in some circumstances.
(eva/q '[:find ?timestamp
:where
[_ :book/title "Process Mining: Data Science in Action" ?tx]
[?tx :db/txInstant ?timestamp]] db)
It is an important realization that :db
keywords can be queried the same way as user-defined schema. Another thing in the above example that may look weird is that we have four arguments in one of our datom clauses where previously we've only used three. Remember that the datom is a 5-tuple [eid attr val tx added?]
. In this case ?tx
binds to the tx
portion of the datom, which is what we are trying to query for.
(eva/q '[:find ?book ?year
:where
[?b :book/title ?book]
[?b :book/year_published ?year]
[(< ?year 2005)]] db)
The (< ?year 2005)
clause is called a predicate, and filters the result set to only include the results which satisfy the predicate. Any Clojure function or Java method can be used as a predicate.
(eva/q '[:find ?book ?y1
:where
[?b1 :book/title ?book]
[?b1 :book/year_published ?y1]
[?b2 :book/title "Software Project Survival Guide"]
[?b2 :book/year_published ?y2]
[(< ?y1 ?y2)]] db)
What if we want to find the oldest or the newest book? Datalog supports a number of aggregate functions, including min
, max
, avg
and sum
.
(eva/q '[:find (min ?year)
:where
[_ :book/year_published ?year]] db)
Throughout this tutorial, if we wanted to the get the author for a particular book we'd need to write the same two where clauses every time. This gets old really fast, and can get quite tedious for more complicated queries. Rules offer a way to abstract away reusable components of a query.
(def rules '[[(book-author ?book ?name)
[?b :book/title ?book]
[?b :book/author ?a]
[?a :author/name ?name]]])
(eva/q '[:find ?name
:in $ %
:where
(book-author "Modeling Business Processes: A Petri-Net Oriented Approach" ?name)]
db rules)
We've created a rule in this example called book-author
. The part contained within the first vector (...)
is called the head of the rule, containing the name and any logic variables. The rest of the rule looks exactly like the where clauses we've seen up to this point. The ?book
and ?name
variables can be used for both input and output. For example, if you provide a value for ?book
the output will be the author of that book. Vice versa, if you provide a value for ?name
you will get back the titles of all the books for that author. If you provide a value for neither ?book
or ?name
the query will return all the possible combinations in the database. To use a rule in a query firstly we need to pass the rule into the :in
clause using the %
symbol. Secondly we call the rule in one of our :where
clauses like (book-author "Modeling Business Processes: A Petri-Net Oriented Approach" ?name)
.
It is our hope that this tutorial has been successful in bootstrapping your basic data entry and querying abilities. Really though, this tutorial only scratches the surface. There is so much more to learn. For example, did you know that rules can call themselves? And what about getting a database snapshot from a previous point in time? Or, how do I take this knowledge and implement it in an actual app? All are good questions but fall out of the scope of this tutorial (although it is our hope we can dig into these topics at a later date).
Have questions, ideas or topics for more tutorials in the future? Create an issue and we'll do our best to respond.
Can you improve this documentation?Edit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close