Urania documentation

1. Introduction

Oftentimes, your business logic relies on remote data that you need to fetch from different sources: databases, caches, web services, or third party APIs, and you can’t mess things up. Urania helps you to keep your business logic clear of low-level details while performing efficiently:

batch multiple requests to the same data source
request data from multiple data sources concurrently
cache previous requests

Having all this gives you the ability to access remote data sources in a concise and consistent way, while the library handles batching and overlapping requests to multiple data sources behind the scenes.

2. Installation

The simplest way to use urania in a Clojure project is by including it as a dependency in your project.clj:

[funcool/urania "0.1.0"]

2.1. Limitations

requires Java 8 when used from Clojure due to its use of java.util.concurrent.CompletableFuture
works with the Promesa library only (if you use other async mechanism like futures you can easily turn your code to be compatible with promises)
assumes your operations with data sources are "side-effect free", so you don’t really care about the order of fetches
you need enough memory to store the whole data fetched during a single run! call (in case it’s impossible you should probably look into other ways to solve your problem, i.e. data stream libraries)

3. User Guide

3.1. Rationale

A core problem of many systems is balancing expressiveness against performance.

Let’s imagine the problem of calculating the number of friends in common that two users have, where we fetch the user data from a remote data source.

(require '[clojure.set :refer [intersection]])

(defn friends-of
  [id]
  ;; ...
  )

(defn count-common
  [a b]
  (count (intersection a b)))

(defn count-common-friends
  [x y]
  (count-common (friends-of x) (friends-of y)))

(count-common-friends 1 2)

Here, (friends-of x) and (friends-of y) are independent, and you want them to be fetched concurrently or in a single request. Furthermore, if x and y refer to the same person, you don’t want to redundantly re-fetch their friend list. What would the code look like if we applied the mentioned optimizations? We’d have to mix different concerns like caching and batching together with the business logic we perform with the data.

Urania allows your data fetches to be implicitly concurrent with little changes to the original code, here’s a spoiler:

(require '[urania.core :as u])

(defn count-common-friends [x y]
  (u/map count-common
         (friends-of x)
         (friends-of y)))

(u/run! (count-common-friends 1 2))

As you may have noticed, Urania does so separating the data fetch declaration from its execution. When running your fetches, urania will:

request data from multiple data sources concurrently
batch multiple requests to the same data source
cache repeated requests

3.2. Fetching data from remote sources

Reading data from remote data sources is usually asynchronous and/or has the possibility of error, that’s why urania uses the Promise type available in the Promesa library as its result abstraction.

We’ll start by writing a small function for emulating data sources with unpredictable latency. In ClojureScript:

(require '[promesa.core :as prom])

(defn remote-req [id result]
  (prom/promise
    (fn [resolve reject]
      (let [wait (rand 1000)]
       (println (str "-->[ " id " ] waiting " wait))
       (js/setTimeout #(do (println (str "<--[ " id " ] finished, result: " result))
                           (resolve result))
                      wait)))))

and in Clojure:

(require '[promesa.core :as prom])

(defn remote-req [id result]
  (prom/promise
    (fn [resolve reject]
      (let [wait (rand 1000)]
        (println (str "-->[ " id " ] waiting " wait))
        (Thread/sleep wait)
        (println (str "<--[ " id " ] finished, result: " result))
        (resolve result)))))

3.3. Remote data sources

Now, we define our data sources as types that implement Urania’s DataSource protocol. This protocol has two functions:

-identity, which returns an identifier for the resource (used for caching and deduplication).
-fetch, which fetches the result from the remote data source returning a promise.

(require '[urania.core :as u])

(defrecord FriendsOf [id]
  u/DataSource
  (-identity [_] id)
  (-fetch [_ _]
    (remote-req id (set (range id)))))

(defn friends-of [id]
  (FriendsOf. id))

Now let’s try to fetch some data with Urania.

We’ll use _{urania.core/run!} for running a fetch, it returns a promise.

(u/run! (friends-of 10))
;; -->[ 10 ] waiting 510.17118249719886
;; => #<Promise [~]>
;; <--[ 10 ] finished, result: #{0 7 1 4 6 3 2 9 5 8}

We can block for the promise’s result with _deref:

(deref
  (u/run! (friends-of 10)))
;; -->[ 10 ] waiting 265.2789087406875
;; <--[ 10 ] finished, result: #{0 7 1 4 6 3 2 9 5 8}
;; => #{0 7 1 4 6 3 2 9 5 8}

Or use Urania’s _run!! function. Note that we can only block in Clojure, not in ClojureScript.

(u/run!! (friends-of 10))
;; -->[ 10 ] waiting 265.2789087406875
;; <--[ 10 ] finished, result: #{0 7 1 4 6 3 2 9 5 8}
;; => #{0 7 1 4 6 3 2 9 5 8}

For convenience, the rest of the documentation will be using run!! although is not available in ClojureScript.

3.3.1. Transforming fetched data

We can use urania.core/map function for transforming results of a data source.

(u/run!!
  (u/map count (friends-of 10)))
;; -->[ 10 ] waiting 463.370748219846
;; <--[ 10 ] finished, result: #{0 7 1 4 6 3 2 9 5 8}
;; => 10

And compose multiple transformations together:

(u/run!!
  (u/map dec (u/map count (friends-of 10))))
;; -->[ 10 ] waiting 463.370748219846
;; <--[ 10 ] finished, result: #{0 7 1 4 6 3 2 9 5 8}
;; => 9

3.3.2. Dependencies between results

Let’s imagine we have another information we want to fetch: a user’s activity score. For fetching a user’s activity score we’ll need to fetch the user first, and urania provides a combinator for doing so: urania.core/mapcat.

First of all, let’s define our activity score data source:

(defrecord ActivityScore [id]
  u/DataSource
  (-identity [_] id)
  (-fetch [_ _]
    (remote-req id (inc id))))

(defn activity
  [id]
  (ActivityScore. id))

Now we want to fetch the activity scores of the first friend of a certain user. We need to know intermediate results of a fetch to continue, so we use urania.core/mapcat:

(defn first-friends-activity
  [id]
  (u/mapcat (fn [friends]
              (activity (first friends)))
            (friends-of id)))

We can now run this fetch:

(u/run!! (first-friends-activity 10))
;; -->[ 10 ] waiting 575.5289747556875
;; <--[ 10 ] finished, result: #{0 7 1 4 6 3 2 9 5 8}
;; -->[ 0 ] waiting 63.24540090623976
;; <--[ 0 ] finished, result: 1
;; => 1

But, what if we wanted the activity score for every friend of a user? urania provides a combinator for transforming a list of data sources into a data source that returns a list of results: urania.core/collect.

Let’s use it to collect the activity score for every user:

(defn friends-activity
  [id]
  (u/mapcat (fn [friends]
              (u/collect (map activity friends)))
            (friends-of id)))

If we run it:

(u/run!! (friends-activity 5))
;; -->[ 5 ] waiting 480.8846764476696
;; <--[ 5 ] finished, result: #{0 1 4 3 2}
;; -->[ 0 ] waiting 488.58045819535687
;; -->[ 1 ] waiting 87.96784013662884
;; -->[ 4 ] waiting 868.2747930486679
;; <--[ 1 ] finished, result: 2
;; -->[ 3 ] waiting 293.59429652774116
;; <--[ 3 ] finished, result: 4
;; -->[ 2 ] waiting 280.68098217346835
;; <--[ 0 ] finished, result: 1
;; <--[ 2 ] finished, result: 3
;; <--[ 4 ] finished, result: 5
;; => [1 2 5 4 3]

As you may have noticed, the data sources passed to urania.core/collect are fetched concurrently. Furthermore, it will detect and eliminate duplicate requests:

(u/run!! (u/collect [(friends-of 1) (friends-of 2) (friends-of 2)]))
;; -->[ 2 ] waiting 634.8383950264134
;; -->[ 1 ] waiting 924.8381446535985
;; <--[ 2 ] finished, result: #{0 1}
;; <--[ 1 ] finished, result: #{0}
;; => [#{0} #{0 1} #{0 1}]

See how the friends of the user with id 2 are only fetched once, even when is duplicated in the collection passed to urania.core/collect.

3.3.3. Batching requests

We’ve seen that urania organizes and deduplicates fetches for us but there is still room for improvement. In our examples using urania.core/collect, we’ve seen how requests to the same data source are run concurrently.

In many cases, remote data sources will offer a batch API that we can use to reduce latency when fetching multiple results. If our data sources can be fetched in batches, urania can detect it and optimize our fetches further.

Let’s add batch fetching to the ActivityScore, we just need to implement the BatchedSource protocol. It has only one method: -fetch-multi, which receives the data sources to fetch and must return a promise with a map from the data source identities to their results.

(extend-type ActivityScore
  u/BatchedSource
  (-fetch-multi [score scores _]
    (let [ids (cons (:id score) (map :id scores))]
      (remote-req ids (zipmap ids (map inc ids))))))

Let’s try to run our friends-activity again:

(u/run!! (friends-activity 5))
;; -->[ 5 ] waiting 123.11807342157954
;; <--[ 5 ] finished, result: #{0 1 4 3 2}
;; -->[ (0 1 4 3 2) ] waiting 97.95578032830765
;; <--[ (0 1 4 3 2) ] finished, result: {0 1, 1 2, 4 5, 3 4, 2 3}
;; [1 2 5 4 3]

Our previous fetch of (friends-activity 5) did n + 1 requests to fetch remote data, where n is the number of results of the first query, and we’ve been able to reduce it to 2!

4. Advanced usage

While providing a convenient high-level API, urania allows you to customize how your fetches are run.

4.1. Caching

urania stores intermediate results in a cache, grouping data sources by their name and mapping their identity to the fetched value. You can run a fetch and get back both the final cache and the results using urania.core/evaluate! instead of urania.core/run!.

Let’s define a simple data source and fetch some results with urania.core/evaluate! to see the cached values:

(deftype Simple [id result]
  u/DataSource
  (-identity [_] id)
  (-fetch [_ _] (prom/resolved result)))

(deref
  (u/evaluate! (Simple. 1 42)))
;; => [42 {"user.Simple" {1 42}}]

You can see how the resulting promise will have a two-element vector, the first being the result and the second the cache.

We now can run the same fetch without even having to call -fetch again, just by passing a prepopulated cache. We pass it under the :cache keyword to the urania.core/run! method’s options:

(u/run! (Simple. 1 42) {:cache {"user.Simple" {1 42}}})

Note that both urania.core/run!! and urania.core/execute! support receiving an options map with the cache.

If you want to programmaticaly populate a cache, you can do so easily:

(def simple1 (Simple. 1 42))
(def simple2 (Simple. 2 99))

{(u/resource-name simple1) {(u/cache-id simple1) 42
                            (u/cache-id simple2) 99}}
;; => {"user.Simple" {1 42, 2 99}}

4.2. Executor

urania will run your fetch functions asynchronously by default. In Clojure it’ll use java.util.concurrent.ForkJoinPool/commonPool whereas in ClojureScript will use the global setTimeout function.

However, you can customize how the fetch functions are run providing a custom executor as an option. In Clojure, you can pass any object that implements java.util.concurrent.Executor and it will just work. If you want more fine-grained control you must pass a type implementing the IExecutor protocol.

Let’s implement a dummy synchronous executor as an example and use it:

(def sync-executor
  (reify u/IExecutor
    (-execute [_ task]
      (task))))

(u/run!! (Simple. 1 42) {:executor sync-executor})
;; => 42

4.3. Environment

The data fetching is commonly performed using stateful objects like a database connection, HTTP client and so on. You may have noticed that both -fetch and -fetch-multi take a last argument that we haven’t used so far: the environment.

The environment is a way for passing arguments to the fetch and fetch-multi functions, let’s see it in action:

(defrecord Environment [id]
  u/DataSource
  (-identity [_] id)
  (-fetch [_ env] (prom/resolved env)))

(u/run!! (Environment. 1) {:env {:connection :a-connection}})
;; => {:connection :a-connection}

As you can see, the :env value that we pass in the options is available for single fetches. Batched fetches are no exception:

(defrecord Environment [id]
  u/DataSource
  (-identity [_] id)
  (-fetch [_ env] (prom/resolved env))

  u/BatchedSource
  (-fetch-multi [_ envs env]
    (let [ids (cons id (map :id envs))]
      (prom/resolved (zipmap ids (map vector ids (repeat env)))))))

(u/run!! (u/collect [(Environment. 1) (Environment. 2)])
         {:env {:connection :a-connection}})
;; => [[1 {:connection :a-connection}] [2 {:connection :a-connection}]]

5. More resources

5.1. Talks

"Reinventing Haxl: Efficient, Concurrent and Concise Data Access" at EuroClojure 2015: [Video](https://goo.gl/masrsz), [Slides](https://goo.gl/h4Zuvr)

Haxl (https://github.com/facebook/Haxl) - Haskell library, Facebook, open-sourced
Stitch (https://www.youtube.com/watch?v=VVpmMfT8aYw) - Scala library, Twitter, not open-sourced

8. License

Copyright (c) 2015 Alexey Kachayev
Copyright (c) 2015 Alejandro Gómez <alejandro@dialelo.com>
Copyright (c) 2015 Andrey Antukh <niwi@niwi.nz>

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

`Ctrl`+`k`	Jump to recent docs
`←`	Move to previous article
`→`	Move to next article
`Ctrl`+`/`	Jump to the search field