Within Clojure, we call start-node
from within crux.api
, passing it a set of
options for the node. There are a number of different configuration options a Crux node
can have, grouped into topologies.
Name | Transaction Log | Topology |
---|---|---|
Uses local event log |
| |
Uses Kafka |
| |
Uses JDBC event log |
|
Use a Kafka node when horizontal scalability is required or when you want the guarantees that Kafka offers in terms of resiliency, availability and retention of data.
Multiple Kafka nodes participate in a cluster with Kafka as the primary store and as the central means of coordination.
The JDBC node is useful when you don’t want the overhead of maintaining a Kafka cluster. Read more about the motivations of this setup here.
The Standalone node is a single Crux instance which has everything it needs locally. This is good for experimenting with Crux and for small to medium sized deployments, where running a single instance is permissible.
Crux nodes implement the ICruxAPI
interface and are the
starting point for making use of Crux. Nodes also implement
java.io.Closeable
and can therefore be lifecycle managed.
The following properties are within the topology used as a base
for the other topologies, crux.node
:
Property | Default Value |
---|---|
|
|
From version 20.01-1.7.0-alpha-SNAPSHOT the kv-store should be
specified by including an extra module in the node’s topology vector. For
example a rocksdb backend looks like {:crux.node/topology
'[crux.standalone/topology crux.kv.rocksdb/kv-store]}
|
The following set of options are used by KV backend implementations,
defined within crux.kv
:
Property | Description | Default Value |
---|---|---|
| Directory to store K/V files | data |
| Sync the KV store to disk after every write? | false |
| Check and store index version upon start? | true |
Using a Crux standalone node is the best way to get started. Once you’ve started a standalone Crux instance as described below, you can then follow the getting started example.
Property | Description | Default Value |
---|---|---|
| Key/Value store to use for standalone event-log persistence | 'crux.kv.rocksdb/kv |
| Directory used to store the event-log and used for backup/restore, i.e. | |
| Sync the event-log backend KV store to disk after every write? | false |
Project Dependency
link:./deps.edn[role=include]
Getting started
The following code creates a default crux.standalone
node which runs completely within memory (with both the event-log store and db store using crux.kv.memdb/kv
):
link:./src/docs/examples.clj[role=include]
link:./src/docs/examples.clj[role=include]
You can later stop the node if you wish:
link:./src/docs/examples.clj[role=include]
RocksDB is often used as Crux’s primary store (in place of the in-memory kv store in the example above). In order to use RocksDB within Crux, however, you must first add RocksDB as a project dependency:
Project Dependency
Starting a node using RocksDB
link:./src/docs/examples.clj[role=include]
You can create a node with custom RocksDB options by passing extra keywords in the topology. These are:
:crux.kv.rocksdb/disable-wal?
, which takes a boolean (if true, disables the write ahead log)
:crux.kv.rocksdb/db-options
, which takes a RocksDB 'Options' object (see more here, from the RocksDB javadocs)
To include rocksdb metrics in monitoring crux.kv.rocksdb/kv-store-with-metrics
should be
included in the topology map instead of the above.
An alternative to RocksDB, LMDB provides faster queries in exchange for a slower ingest rate.
Project Dependency
Starting a node using LMDB
link:./src/docs/examples.clj[role=include]
When using Crux at scale it is recommended to use multiple Crux nodes connected via a Kafka cluster.
Kafka nodes have the following properties:
Property | Description | Default value |
---|---|---|
| URL for connecting to Kafka | localhost:9092 |
| Name of Kafka transaction log topic | crux-transaction-log |
| Name of Kafka documents topic | crux-docs |
| Option to automatically create Kafka topics if they do not already exist | true |
| Number of partitions for the document topic | 1 |
| Number of times to replicate data on Kafka | 1 |
| File to supply Kafka connection properties to the underlying Kafka API | |
| Map to supply Kafka connection properties to the underlying Kafka API |
Project Dependencies
link:./deps.edn[role=include]
link:./deps.edn[role=include]
Getting started
Use the API to start a Kafka node, configuring it with the
bootstrap-servers
property in order to connect to Kafka:
link:./src/docs/examples.clj[role=include]
If you don’t specify kv-store then by default the
Kafka node will use RocksDB. You will need to add RocksDB to
your list of project dependencies.
|
You can later stop the node if you wish:
link:./src/docs/examples.clj[role=include]
Crux is ready to work with an embedded Kafka for when you don’t have an independently running Kafka available to connect to (such as during development).
Project Depencies
Getting started
link:./src/docs/examples.clj[role=include]
link:./src/docs/examples.clj[role=include]
You can later stop the Embedded Kafka if you wish:
link:./src/docs/examples.clj[role=include]
JDBC Nodes use next.jdbc
internally and pass through the relevant configuration options that
you can find
here.
Below is the minimal configuration you will need:
Property | Description |
---|---|
| One of: postgresql, oracle, mysql, h2, sqlite |
| Database Name |
Depending on the type of JDBC database used, you may also need some of the following properties:
Property | Description |
---|---|
| For h2 and sqlite |
| Database Host |
| Database Username |
| Database Password |
Project Dependencies
link:./deps.edn[role=include]
link:./deps.edn[role=include]
Getting started
Use the API to start a JDBC node, configuring it with the required parameters:
link:./src/docs/examples.clj[role=include]
Crux can be used programmatically as a library, but Crux also ships with an embedded HTTP server, that allows clients to use the API remotely via REST.
Set the server-port
configuration property on a Crux node to
expose a HTTP port that will accept REST requests:
Component | Property | Description |
---|---|---|
crux.http-server |
| Port for Crux HTTP Server e.g. |
Visit the guide on using the REST api for examples of how to interact with Crux over HTTP.
Project Dependency
link:./deps.edn[role=include]
You can start up a HTTP server on a node by including
crux.http-server/module
in your topology, optionally passing the server port:
link:./src/docs/examples.clj[role=include]
Project Dependency
link:./deps.edn[role=include]
To connect to a pre-existing remote node, you need a URL to the node and the above on your classpath. We can then call crux.api/new-api-client
, passing the URL. If the node was started on localhost:3000
, you can connect to it by doing the following:
link:./src/docs/examples.clj[role=include]
The remote client requires valid and transaction time to be specified for all calls to crux/db .
|
If you wish to use Crux with Docker (no JVM/JDK/Clojure install required!) we have the following:
Crux HTTP Node: An image of a standalone Crux node (using a in memory kv-store
by default) & HTTP server, useful if you wish to a freestanding Crux node accessible over HTTP, only having to use Docker.
Alongside the various images available on Dockerhub, there are a number of artifacts available for getting started quickly with Crux. These can be found on the latest release of Crux. Currently, these consist of a number of common configuration uberjars and a custom artifact builder.
To create your own custom artifacts for crux, do the following:
Download and extract the crux-builder.tar.gz
from the latest release
You can build an uberjar using either Clojure’s deps.edn or Maven (whichever you’re more comfortable with)
For Clojure, you can add further Crux dependencies in the deps.edn file, set the node config in crux.edn, and run build-uberjar.sh
For Maven, it’s the same, but dependencies go in pom.xml
Additionally, you can build a Docker image using the build-docker.sh
script in the docker directory.
Crux provides utility APIs for local backup and restore when you are using the standalone mode.
An additional example of backup and restore is provided that only applies to a stopped standalone node here.
In a clustered deployment, only Kafka’s official backup and restore functionality should be relied on to provide safe durability. The standalone mode’s backup and restore operations can instead be used for creating operational snapshots of a node’s indexes for scaling purposes.
Crux can display metrics through a variety of interfaces. Internally, it uses dropwizard’s metrics library to register all the metrics and then passes the registry around to reporters to display the data in a suitable application.
By default indexer
and query
metrics are included. It is also
possible to add rocksdb
metrics when it is being used. These arguments can be
used whenever any of the topologies to display metrics are included.
Field | Property | Default | Description |
---|---|---|---|
|
|
| Includes indexer metrics in the metrics registry |
|
|
| Includes query metrics in the metrics registry |
To include rocksdb metrics in monitoring the
'crux.kv.rocksdb/kv-store-with-metrics
module should be included in the
topology map instead of 'crux.kv.rocksdb/kv-store
(api/start-node {:crux.node/topology ['crux.standalone/topology
'crux.kv.rocksdb/kv-store-with-metrics
...]
...})
Crux currently supports the following outputs:
Console stdout
CSV file
Prometheus (reporter & http exporter)
This component logs metrics to sysout
at regular intervals.
(api/start-node {:crux.node/topology ['crux.standalone/topology
'crux.metrics/with-console]
...
})
Field | Property | Description |
---|---|---|
|
| Interval in seconds between output dump |
|
| Unit which rates are displayed |
|
| Unit which durations are displayed |
This component logs metrics to a csv file at regular intervals. Only filename is required.
(api/start-node {:crux.node/topology ['crux.standalone/topology
'crux.metrics/with-csv]
:crux.metrics.dropwizard.csv/file-name "out.csv"
...
})
Field | Property | Required |
---|---|---|
Description |
|
|
Output file location |
|
|
Interval in seconds between file write |
|
|
| Unit which rates are displayed |
|
|
| Unit which durations are displayed |
Provides JMX mbeans output.
(api/start-node {:crux.node/topology ['crux.standalone/topology
'crux.metrics/with-jmx]
...
})
Field | Property | Description |
---|---|---|
|
| Change metrics domain group |
|
| Unit which rates are displayed |
|
| Unit which durations are displayed |
The prometheus http exporter starts a standalone server hosting prometheus metrics by default at http://localhost:8080/metrics. The port can be changed with an argument, and jvm metrics can be included in the dump.
(api/start-node {:crux.node/topology ['crux.standalone/topology
'crux.metrics/with-prometheus-http-exporter]
...
})
Field | Property | Description |
---|---|---|
|
| Desired port number for prometheus client server. Defaults to |
|
| If |
This component pushes prometheus metrics to a specified pushgateway
at
regular durations (by default 1 second).
(api/start-node {:crux.node/topology ['crux.standalone/topology
'crux.metrics/with-prometheus-reporter]
:crux.metric.dropwizard.prometheus/pushgateway "localhost:9090"
...
})
Field | Property | Description |
---|---|---|
|
| Address of the prometheus server. This field is required |
|
| Time in ISO-8601 standard between metrics push. Defaults to "PT1S". |
|
| Prefix all metric titles with this string |
Pushes metrics to Cloudwatch. This is indented to be used with a crux node running inside a EBS/Fargate instance. It attempts to get the relevant credentials through system variables. Crux uses this in its aws benchmarking system which can be found here.
(api/start-node {:crux.node/topology ['crux.standalone/topology
'crux.metrics/with-cloudwatch]
...
})
Field | Property | Description |
---|---|---|
|
| Time between metrics push |
|
| When |
|
| Should jvm metrics be included in the pushed metrics? |
|
| Should jvm metrics be included in the pushed metrics? |
|
| Cloudwatch region for uploading metrics. Not required inside a EBS/Fargate instance but needed for local testing. |
|
| A list of strings to ignore specific metrics, in gitignore format. e.g. |
To upload metrics to Cloudwatch locally the desired region needs to be
specified with :crux.metrics.dropwizard.prometheus/region
, and your aws
credentials at ~/.aws/credentials
need to be visible (If ran in docker, mount
these as a volume).
When ran on aws if using cloudformation the node needs to have the permission
'cloudwatch:PutMetricData'
. For a example see Crux’s benchmarking system
here.
Can you improve this documentation? These fine people already did:
Daniel Mason, Tom Taylor, Jeremy Taylor, James Henderson, Jon Pither, Dan Mason, Antonelli712, Ivan Fedorov, Ben Gerard & Alex DavisEdit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close