Liking cljdoc? Tell your friends :D

Crux Topology Migration Guide, 1.12.0

We talk a lot about Crux being open for extension - you could say it’s one of our favourite features! During the 1.12.0 release we’ll be overhauling the way to start up a Crux node - adding JSON support, a more-idiomatic Java API, and an upgrade for Clojure too.

The full extent of configuration in Crux might seem quite involved - we’re aiming for that sweet spot where it’s easy for end-users to understand and get started, sufficiently powerful for advanced users to set up Crux exactly as they want it, and sufficently extensible for community authors to write and share their own modules.

To start, we’d recommend reading the new Installation and Configuration guides. Details about how to configure specific modules (Kafka, JDBC, et al) are in their individual pages.

Alright - show me some before/after examples already!

A few notes before we begin:

  • The three main pluggable components - the transaction log, the document store, and the query index store - now take centre-stage. Each of these can be configured separately - both the implementation and the parameters can be independently overridden.

  • The config is now nested rather than flat - rather than specifying options at the top-level, include them in the appropriate module’s config.

  • 'Topologies' no longer exist as a separate concept - we specify tx-log, document-store and index-store separately.

  • 'Standalone' no longer exists as a separate concept - because of the KV-backed implementations, it’s now the default.

  • Args passed to modules no-longer need to be namespace-qualified - they are already scoped to their module.

Right, some examples:

  • A 'standalone' (in-memory) node:

    ;; before
    {:crux.node/topology '[crux.standalone/topology]}
    
    ;; after
    {}
  • A standalone RocksDB/LMDB node, with a shared tx-log/document-store. Note the specification of the shared RocksDB instance for the tx-log and document-store.

    ;; before
    {:crux.node/topology '[crux.standalone/topology crux.kv.rocksdb/kv-store]
     :crux.kv/db-dir "indexes"
     :crux.standalone/event-log-kv-store 'crux.kv.rocksdb/kv
     :crux.standalone/event-log-dir "event-log"}
    
    ;; after
    {:legacy-standalone-db {:crux/module `crux.rocksdb/->kv-store, :db-dir "event-log"}
     :crux/tx-log {:kv-store :legacy-standalone-db}
     :crux/document-store {:kv-store :legacy-standalone-db}
     :crux/indexer {:kv-store {:crux/module `crux.rocksdb/->kv-store, :db-dir "indexes"}}}

    It’s no longer necessary to share the RocksDB instance for the tx-log and document-store - you can specify this as follows:

    {:crux/tx-log {:kv-store {:crux/module `crux.rocksdb/->kv-store, :db-dir "tx-log"}}
     :crux/document-store {:kv-store {:crux/module `crux.rocksdb/->kv-store, :db-dir "docs"}}
     :crux/indexer {:kv-store {:crux/module `crux.rocksdb/->kv-store, :db-dir "indexes"}}}

    For LMDB, use crux.lmdb/->kv-store.

  • A Kafka tx-log/document-store with RocksDB query indices:

    The Kafka document store requires a local document store for fast random document access - previously this used the same KV store as the query indices, but now it’s decoupled and explicit:

    (require '[crux.kafka :as k]
             '[crux.rocksdb :as rocks])
    
    ;; before
    {:crux.node/topology '[crux.kafka/topology crux.kv.rocksdb/kv-store]
     :crux.kafka/bootstrap-servers "localhost:9092"
     :crux.kv/db-dir "localdata"}
    
    ;; after, continuing to use existing shared store
    {::k/kafka-config {:bootstrap-servers "localhost:9092"
                       :properties-file ...
                       :properties-map ...}
     :local-kv-store {:crux/module `rocks/->kv-store, :db-dir "localdata"}
     :crux/tx-log {:crux/module `k/->tx-log, :kafka-config ::k/kafka-config}
     :crux/document-store {:crux/module `k/->document-store
                           :kafka-config ::k/kafka-config
                           :local-document-store {:kv-store :local-kv-store}}
     :crux/indexer {:kv-store :local-kv-store}}
    
    ;; after, using new decoupled stores
    {::k/kafka-config {:bootstrap-servers "localhost:9092"}
     :crux/tx-log {:crux/module `k/->tx-log, :kafka-config ::k/kafka-config}
     :crux/document-store {:crux/module `k/->document-store
                           :kafka-config ::k/kafka-config
                           :local-document-store {:kv-store {:crux/module `rocks/->kv-store, :db-dir "docs"}}}
     :crux/indexer {:kv-store {:crux/module `rocks/->kv-store, :db-dir "indexes"}}}
  • A Postgres tx-log/document-store:

    In JDBC, we tend to share the connection pool between tx-log and document-store, although it’s also possible to use different connection pools if need be.

    (:require '[crux.jdbc :as jdbc])
    
    ;; before
    {:crux.node/topology '[crux.jdbc/topology]
     :crux.jdbc/dbtype "postgresql"
     :crux.jdbc/dbname "cruxdb"
     :crux.jdbc/host "<host>"
     :crux.jdbc/user "<user>"
     :crux.jdbc/password "<password>"}
    
    ;; after - shared connection pool
    {::jdbc/connection-pool {:dialect 'crux.jdbc.psql/->dialect
                             :db-spec {:dbname "cruxdb"
                                       :host "<host>"
                                       :user "<user>"
                                       :password "<password>"}}
     :crux/tx-log {:crux/module `crux.jdbc/->tx-log, :connection-pool ::jdbc/connection-pool}
     :crux/document-store {:crux/module `crux.jdbc/->document-store, :connection-pool ::jdbc/connection-pool}
     :crux/indexer {...}}
  • A Kafka tx-log, S3 document store:

    {:crux/tx-log {:crux/module `crux.kafka/->tx-log, :kafka-config {:bootstrap-servers "localhost:9092", ...}}
     :crux/document-store {:crux/module `crux.s3/->document-store, :bucket "s3-docs"}}
  • With an HTTP server

    ;; before
    {:crux.node/topology '[...
                           crux.http-server/server]}
    
    ;; after, being explicit
    {...
     :http-server {:crux/module 'crux.http-server/->server, :port 8080}}
    
    ;; after, using automatic module resolution
    {...
     :crux.http-server/server {:port 8080}}
  • With metrics sent to CloudWatch

    ;; before
    {:crux.node/topology '[...
                           crux.metrics.dropwizard.cloudwatch/reporter]
     ...}
    
    ;; after
    {...
     :crux.metrics.cloudwatch/reporter {...}}
  • With RocksDB metrics

    Previously, you could only attach RocksDB metrics to the query indices KV store - now, they can be requested on any of the KV stores

    ;; before
    {:crux.node/topology '[...
                           crux.kv.rocksdb/kv-store-with-metrics]}
    
    ;; after
    {:crux/indexer {:kv-store {:crux/module `crux.rocksdb/->kv-store
                               :metrics {:crux/module `crux.rocksdb.metrics/->metrics
                                         :instance "indexer"}}}}

If your setup isn’t included here and you’d like some pointers, let us know :)

For module authors:

Modules can currently only be written in Clojure (we’re looking to add Java support in the future).

Module implementations are plain-old Clojure functions, with some additional metadata. By convention, we prefix the names of these functions with ->, implying that the function creates an instance of the module. We then add ::sys/deps and ::sys/args metadata to the functions:

  • ::sys/deps is a map from the local key to the default implementation/configuration/reference of the dependency, specified as above.

  • ::sys/args is a map describing the possible arguments to the component, their specs, whether they’re required (:required? true), and what they default to.

The function itself is then expected to take a map of the started deps and passed args.

;; before

(def my-module
  {::my-first-module {:args {:max-limit {:doc "The maximum limit"
                                         :default 10
                                         :crux.config/type :crux.config/int}}
                      :start-fn (fn [_ {:keys [max-limit]}]
                                  ...)}

   ::my-second-module {:deps #{::my-first-module}
                       :start-fn (fn [{:keys [my-first-module]} _]
                                   ...)}})

;; after

(require '[crux.system :as sys])

(defn ->my-first-module {::sys/args {:max-limit {:spec ::sys/int
                                                 :doc "The maximum limit"
                                                 :required? true
                                                 :default 10}}}
  [{:keys [max-limit]}]
  ...)

(defn ->my-second-module {::sys/deps {:module-1 {:crux/module `->my-first-module, :max-limit 100}}
                          ::sys/args {...}}
  [{:keys [module-1]}]
  ...)

Bear in mind that the end-user can then provide the final value of :max-limit by supplying config like this:

{:module-2 {:crux/module `->my-second-module
            :module-1 {:max-limit 100000}}}

Get in touch!

As always, we’d love to hear from you - whether it’s thoughts on the above, if you’ve found a bug, or showing us what you’ve built. We can be contacted through Zulip, Github, Clojurians' Slack (#crux) or crux@juxt.pro

Cheers!

Crux Team

Can you improve this documentation?Edit on GitHub

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close