Liking cljdoc? Tell your friends :D

Configuration

To start a Crux node, use the Java API or the Clojure crux.api.

Within Clojure, we call start-node from within crux.api, passing it a set of options for the node. There are a number of different configuration options a Crux node can have, grouped into topologies.

Table 1. Crux Topologies
NameTransaction LogTopology

Standalone

Uses local event log

:crux.standalone/topology

Kafka

Uses Kafka

:crux.kafka/topology

JDBC

Uses JDBC event log

:crux.jdbc/topology

Use a Kafka node when horizontal scalability is required or when you want the guarantees that Kafka offers in terms of resiliency, availability and retention of data.

Multiple Kafka nodes participate in a cluster with Kafka as the primary store and as the central means of coordination.

The JDBC node is useful when you don’t want the overhead of maintaining a Kafka cluster. Read more about the motivations of this setup here.

The Standalone node is a single Crux instance which has everything it needs locally. This is good for experimenting with Crux and for small to medium sized deployments, where running a single instance is permissible.

Crux nodes implement the ICruxAPI interface and are the starting point for making use of Crux. Nodes also implement java.io.Closeable and can therefore be lifecycle managed.

Properties

The following properties are within the topology used as a base for the other topologies, crux.node:

Table 2. crux.node configuration
PropertyDefault Value

:crux.node/object-store

'crux.object-store/kv-object-store

From version 20.01-1.7.0-alpha-SNAPSHOT the kv-store should be specified by including an extra module in the node’s topology vector. For example a rocksdb backend looks like {:crux.node/topology '[crux.standalone/topology crux.kv.rocksdb/kv-store]}

The following set of options are used by KV backend implementations, defined within crux.kv:

Table 3. crux.kv options
PropertyDescriptionDefault Value

:crux.kv/db-dir

Directory to store K/V files

data

:crux.kv/sync?

Sync the KV store to disk after every write?

false

:crux.kv/check-and-store-index-version

Check and store index version upon start?

true

Standalone Node

Using a Crux standalone node is the best way to get started. Once you’ve started a standalone Crux instance as described below, you can then follow the getting started example.

Local Standalone Mode
Table 4. Standalone configuration
PropertyDescriptionDefault Value

:crux.standalone/event-log-kv-store

Key/Value store to use for standalone event-log persistence

'crux.kv.rocksdb/kv

:crux.standalone/event-log-dir

Directory used to store the event-log and used for backup/restore, i.e. "data/eventlog-1"

:crux.standalone/event-log-sync?

Sync the event-log backend KV store to disk after every write?

false

Project Dependency

link:./deps.edn[role=include]

Getting started

The following code creates a default crux.standalone node which runs completely within memory (with both the event-log store and db store using crux.kv.memdb/kv):

link:./src/docs/examples.clj[role=include]

link:./src/docs/examples.clj[role=include]

You can later stop the node if you wish:

link:./src/docs/examples.clj[role=include]

RocksDB

RocksDB is often used as Crux’s primary store (in place of the in-memory kv store in the example above). In order to use RocksDB within Crux, however, you must first add RocksDB as a project dependency:

Project Dependency

Starting a node using RocksDB

link:./src/docs/examples.clj[role=include]

You can create a node with custom RocksDB options by passing extra keywords in the topology. These are:

  • :crux.kv.rocksdb/disable-wal?, which takes a boolean (if true, disables the write ahead log)

  • :crux.kv.rocksdb/db-options, which takes a RocksDB 'Options' object (see more here, from the RocksDB javadocs)

To include rocksdb metrics in monitoring crux.kv.rocksdb/kv-store-with-metrics should be included in the topology map instead of the above.

LMDB

An alternative to RocksDB, LMDB provides faster queries in exchange for a slower ingest rate.

Project Dependency

Starting a node using LMDB

link:./src/docs/examples.clj[role=include]

Kafka Nodes

When using Crux at scale it is recommended to use multiple Crux nodes connected via a Kafka cluster.

Local Cluster Mode

Kafka nodes have the following properties:

Table 5. Kafka node configuration
PropertyDescriptionDefault value

:crux.kafka/bootstrap-servers

URL for connecting to Kafka

localhost:9092

:crux.kafka/tx-topic

Name of Kafka transaction log topic

crux-transaction-log

:crux.kafka/doc-topic

Name of Kafka documents topic

crux-docs

:crux.kafka/create-topics

Option to automatically create Kafka topics if they do not already exist

true

:crux.kafka/doc-partitions

Number of partitions for the document topic

1

:crux.kafka/replication-factor

Number of times to replicate data on Kafka

1

:crux.kafka/kafka-properties-file

File to supply Kafka connection properties to the underlying Kafka API

:crux.kafka/kafka-properties-map

Map to supply Kafka connection properties to the underlying Kafka API

Project Dependencies

link:./deps.edn[role=include]
link:./deps.edn[role=include]

Getting started

Use the API to start a Kafka node, configuring it with the bootstrap-servers property in order to connect to Kafka:

link:./src/docs/examples.clj[role=include]
If you don’t specify kv-store then by default the Kafka node will use RocksDB. You will need to add RocksDB to your list of project dependencies.

You can later stop the node if you wish:

link:./src/docs/examples.clj[role=include]

Embedded Kafka

Crux is ready to work with an embedded Kafka for when you don’t have an independently running Kafka available to connect to (such as during development).

Project Depencies

Getting started

link:./src/docs/examples.clj[role=include]

link:./src/docs/examples.clj[role=include]

You can later stop the Embedded Kafka if you wish:

link:./src/docs/examples.clj[role=include]

JDBC Nodes

JDBC Nodes use next.jdbc internally and pass through the relevant configuration options that you can find here.

Local Cluster Mode

Below is the minimal configuration you will need:

Table 6. Minimal JDBC Configuration
PropertyDescription

:crux.jdbc/dbtype

One of: postgresql, oracle, mysql, h2, sqlite

:crux.jdbc/dbname

Database Name

Depending on the type of JDBC database used, you may also need some of the following properties:

Table 7. Other JDBC Properties
PropertyDescription

:crux.kv/db-dir

For h2 and sqlite

:crux.jdbc/host

Database Host

:crux.jdbc/user

Database Username

:crux.jdbc/password

Database Password

Project Dependencies

link:./deps.edn[role=include]
link:./deps.edn[role=include]

Getting started

Use the API to start a JDBC node, configuring it with the required parameters:

link:./src/docs/examples.clj[role=include]

HTTP

Crux can be used programmatically as a library, but Crux also ships with an embedded HTTP server, that allows clients to use the API remotely via REST.

Remote Cluster Mode

Set the server-port configuration property on a Crux node to expose a HTTP port that will accept REST requests:

Table 8. HTTP Nodes Configuration
ComponentPropertyDescription

crux.http-server

port

Port for Crux HTTP Server e.g. 8080

Visit the guide on using the REST api for examples of how to interact with Crux over HTTP.

Starting a HTTP Server

Project Dependency

link:./deps.edn[role=include]

You can start up a HTTP server on a node by including crux.http-server/module in your topology, optionally passing the server port:

link:./src/docs/examples.clj[role=include]

Using a Remote API Client

Project Dependency

link:./deps.edn[role=include]

To connect to a pre-existing remote node, you need a URL to the node and the above on your classpath. We can then call crux.api/new-api-client, passing the URL. If the node was started on localhost:3000, you can connect to it by doing the following:

link:./src/docs/examples.clj[role=include]
The remote client requires valid and transaction time to be specified for all calls to crux/db.

Docker

If you wish to use Crux with Docker (no JVM/JDK/Clojure install required!) we have the following:

  • Crux HTTP Node: An image of a standalone Crux node (using a in memory kv-store by default) & HTTP server, useful if you wish to a freestanding Crux node accessible over HTTP, only having to use Docker.

Artifacts

Alongside the various images available on Dockerhub, there are a number of artifacts available for getting started quickly with Crux. These can be found on the latest release of Crux. Currently, these consist of a number of common configuration uberjars and a custom artifact builder.

To create your own custom artifacts for crux, do the following:

  • Download and extract the crux-builder.tar.gz from the latest release

  • You can build an uberjar using either Clojure’s deps.edn or Maven (whichever you’re more comfortable with)

    • For Clojure, you can add further Crux dependencies in the deps.edn file, set the node config in crux.edn, and run build-uberjar.sh

    • For Maven, it’s the same, but dependencies go in pom.xml

  • Additionally, you can build a Docker image using the build-docker.sh script in the docker directory.

Backup and Restore

Crux provides utility APIs for local backup and restore when you are using the standalone mode.

An additional example of backup and restore is provided that only applies to a stopped standalone node here.

In a clustered deployment, only Kafka’s official backup and restore functionality should be relied on to provide safe durability. The standalone mode’s backup and restore operations can instead be used for creating operational snapshots of a node’s indexes for scaling purposes.

Monitoring

Crux can display metrics through a variety of interfaces. Internally, it uses dropwizard’s metrics library to register all the metrics and then passes the registry around to reporters to display the data in a suitable application.

Project Dependency

In order to use any of the crux-metrics reporters, you will need to include the following dependency on crux-metrics:

link:./deps.edn[role=include]

The various types of metric reporters bring in their own sets of dependencies, so we expect these to be provided by the user in their own project (in order to keep the core of crux-metrics as lightweight as possible). Reporters requiring further dependencies will have an 'additional dependencies' section.

Getting Started

By default indexer and query metrics are included. It is also possible to add rocksdb metrics when it is being used. These arguments can be used whenever any of the topologies to display metrics are included.

Table 9. Registry arguments
FieldPropertyDefaultDescription

:crux.metrics/with-indexer-metrics?

boolean

true

Includes indexer metrics in the metrics registry

:crux.metrics/with-query-metrics?

boolean

true

Includes query metrics in the metrics registry

RocksDB metrics

To include the RocksDB metrics when monitoring the 'crux.kv.rocksdb/kv-store-with-metrics module should be included in the topology map (in place of 'crux.kv.rocksdb/kv-store):

(api/start-node {:crux.node/topology ['crux.standalone/topology
                                      'crux.kv.rocksdb/kv-store-with-metrics
                                      ...]
                 ...})

Reporters

Crux currently supports the following outputs:

Console

This component logs metrics to sysout at regular intervals.

(api/start-node {:crux.node/topology ['crux.standalone/topology
                                      'crux.metrics.dropwizard.console/reporter]
                 ...
                 })
Table 10. Console metrics arguments
FieldPropertyDescription

:crux.metrics.dropwizard.console/report-frequency

int

Interval in seconds between output dump

:crux.metrics.dropwizard.console/rate-unit

time-unit

Unit which rates are displayed

:crux.metrics.dropwizard.console/duration-unit

time-unit

Unit which durations are displayed

CSV

This component logs metrics to a csv file at regular intervals. Only filename is required.

(api/start-node {:crux.node/topology ['crux.standalone/topology
                                      'crux.metrics.dropwizard.csv/reporter]
                 :crux.metrics.dropwizard.csv/file-name "csv-out"
                 ...
                 })
Table 11. CSV metrics arguments
FieldPropertyRequiredDescription

:crux.metrics.dropwizard.csv/file-name

string

true

Output folder name (must already exist)

:crux.metrics.dropwizard.csv/report-frequency

int

false

Interval in seconds between file write

:crux.metrics.dropwizard.csv/rate-unit

time-unit

false

Unit which rates are displayed

:crux.metrics.dropwizard.csv/duration-unit

time-unit

false

Unit which durations are displayed

JMX

Provides JMX mbeans output.

Additional Dependencies

You will need to add the following dependencies, alongside crux-metrics, in your project:

link:../project.clj[role=include]

Getting Started

(api/start-node {:crux.node/topology ['crux.standalone/topology
                                      'crux.metrics.dropwizard.jmx/reporter]
                 ...
                 })
Table 12. JMX metrics arguments
FieldPropertyDescription

:crux.metrics.dropwizard.jmx/domain

string

Change metrics domain group

:crux.metrics.dropwizard.jmx/rate-unit

time-unit

Unit which rates are displayed

:crux.metrics.dropwizard.jmx/duration-unit

time-unit

Unit which durations are displayed

Prometheus

Additional Dependencies

You will need to add the following dependencies, alongside crux-metrics, in your project:

link:../project.clj[role=include]

HTTP-Exporter

The prometheus http exporter starts a standalone server hosting prometheus metrics by default at http://localhost:8080/metrics. The port can be changed with an argument, and jvm metrics can be included in the dump.

Getting Started

(api/start-node {:crux.node/topology ['crux.standalone/topology
                                      'crux.metrics.dropwizard.prometheus/http-exporter]
                 ...
                 })
Table 13. Prometheus exporter metrics arguments
FieldPropertyDescription

:crux.metrics.dropwizard.prometheus/port

int

Desired port number for prometheus client server. Defaults to 8080

:crux.metrics.dropwizard.prometheus/jvm-metrics?

boolean

If true jvm metrics are included in the metrics dump

Reporter

This component pushes prometheus metrics to a specified pushgateway at regular durations (by default 1 second).

Getting Started

(api/start-node {:crux.node/topology ['crux.standalone/topology
                                      'crux.metrics.dropwizard.prometheus/reporter]
                 :crux.metric.dropwizard.prometheus/pushgateway "localhost:9090"
                 ...
                 })
Table 14. Prometheus reporter metrics arguments
FieldPropertyDescription

:crux.metrics.dropwizard.prometheus/push-gateway

string

Address of the prometheus server. This field is required

:crux.metrics.dropwizard.prometheus/report-frequency

duration

Time in ISO-8601 standard between metrics push. Defaults to "PT1S".

:crux.metrics.dropwizard.prometheus/prefix

string

Prefix all metric titles with this string

AWS Cloudwatch metrics

Pushes metrics to Cloudwatch. This is indented to be used with a crux node running inside a EBS/Fargate instance. It attempts to get the relevant credentials through system variables. Crux uses this in its aws benchmarking system which can be found here.

Additional Dependencies

You will need to add the following dependencies, alongside crux-metrics, in your project:

link:../project.clj[role=include]

Getting Started

(api/start-node {:crux.node/topology ['crux.standalone/topology
                                      'crux.metrics.dropwizard.cloudwatch/reporter]
                 ...
                 })
Table 15. Cloudwatch metrics arguments
FieldPropertyDescription

:crux.metrics.dropwizard.prometheus/duration

duration

Time between metrics push

:crux.metrics.dropwizard.prometheus/dry-run?

boolean

When true the reporter outputs to cloujure.logging/log*

:crux.metrics.dropwizard.prometheus/jvm-metrics?

boolean

Should jvm metrics be included in the pushed metrics?

:crux.metrics.dropwizard.prometheus/jvm-dimensions

string-map

Should jvm metrics be included in the pushed metrics?

:crux.metrics.dropwizard.prometheus/region

string

Cloudwatch region for uploading metrics. Not required inside a EBS/Fargate instance but needed for local testing.

:crux.metrics.dropwizard.prometheus/ignore-rules

string-list

A list of strings to ignore specific metrics, in gitignore format. e.g. ["crux.tx" "!crux.tx.ingest"] would ignore crux.tx.*, except crux.tx.ingest

Tips for running

To upload metrics to Cloudwatch locally the desired region needs to be specified with :crux.metrics.dropwizard.prometheus/region, and your aws credentials at ~/.aws/credentials need to be visible (If ran in docker, mount these as a volume).

When ran on aws if using cloudformation the node needs to have the permission 'cloudwatch:PutMetricData'. For a example see Crux’s benchmarking system here.

Can you improve this documentation? These fine people already did:
Daniel Mason, Tom Taylor, Jeremy Taylor, James Henderson, Jon Pither, Dan Mason, Antonelli712, Ivan Fedorov, Ben Gerard & Alex Davis
Edit on GitHub

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close