When using Crux at scale it is recommended to use multiple Crux nodes connected via a Kafka cluster. Use of multiple nodes provides availability and Kafka itself provides strong fault-tolerance guarantees. Kafka can be used for Crux’s transaction log and document store components.
Kafka’s document store requires a copy of the documents kept locally for random access - these can be stored in a KV store like RocksDB or LMDB. For this reason, unless you want to keep both transactions and documents on Kafka (for simplicity, say, or historical reasons), we’d now recommend a different document store implementation - JDBC or S3, for example. (The Kafka transaction log does not have this requirement) |
juxt/crux-kafka {:mvn/version "{crux_version}-beta"}
<dependency>
<groupId>juxt</groupId>
<artifactId>crux-kafka</artifactId>
<version>{crux_version}-beta</version>
</dependency>
{
"crux/tx-log": {
"crux/module": "crux.kafka/->tx-log",
"kafka-config": {
"bootstrap-servers": "localhost:9092",
...
},
"tx-topic-opts": {
"topic-name": "crux-transaction-log",
...
},
"poll-wait-duration": "PT1S"
},
...
}
{:crux/tx-log {:crux/module 'crux.kafka/->tx-log
:kafka-config {:bootstrap-servers "localhost:9092"}
:tx-topic-opts {:topic-name "crux-transaction-log"}
:poll-wait-duration (Duration/ofSeconds 1)}
...}
{:crux/tx-log {:crux/module crux.kafka/->tx-log
:kafka-config {:bootstrap-servers "localhost:9092"}
:tx-topic-opts {:topic-name "crux-transaction-log"}
:poll-wait-duration "PT1S"}
...}
If you do not want the local node to index transactions, you can use the crux.kafka/->ingest-only-tx-log
module.
{
"crux/document-store": {
"crux/module": "crux.kafka/->document-store",
"kafka-config": {
"bootstrap-servers": "localhost:9092",
...
},
"doc-topic-opts": {
"topic-name": "crux-docs",
...
},
"local-document-store": {
"kv-store": {
"crux/module": "crux.rocksdb/->kv-store",
"db-dir": "/tmp/rocksdb"
}
},
"poll-wait-duration": "PT1S"
},
...
}
{:crux/document-store {:crux/module 'crux.kafka/->document-store
:kafka-config {:bootstrap-servers "localhost:9092"
...}
:doc-topic-opts {:topic-name "crux-docs"
...}
:local-document-store {:kv-store {:crux/module 'crux.rocksdb/->kv-store
:db-dir (io/file "/tmp/rocksdb")}}
:poll-wait-duration (Duration/ofSeconds 1)}
...}
{:crux/document-store {:crux/module crux.kafka/->document-store
:kafka-config {:bootstrap-servers "localhost:9092"
...}
:doc-topic-opts {:topic-name "crux-docs"
...}
:local-document-store {:kv-store {:crux/module crux.rocksdb/->kv-store
:db-dir "/tmp/rocksdb"}}
:poll-wait-duration "PT1S"}
...}
If you do not want the local node to index transactions, you can use the crux.kafka/->ingest-only-document-store
module.
You can use the same local document store as the query indices, as follows:
{
"local-rocksdb": {
"crux/module": "crux.rocksdb/->kv-store",
"db-dir": "/tmp/rocksdb"
},
"crux/document-store": {
...
"local-document-store": {
"kv-store": "local-rocksdb"
}
},
"crux/index-store": {
"kv-store": "local-rocksdb"
}
...
}
{...
:local-rocksdb {:crux/module 'crux.rocksdb/->kv-store
:db-dir (io/file "/tmp/rocksdb")}
:crux/document-store {...
:local-document-store {:kv-store :local-rocksdb}}
:crux/index-store {:kv-store :local-rocksdb}}
{...
:local-rocksdb {:crux/module crux.rocksdb/->kv-store
:db-dir "/tmp/rocksdb"}
:crux/document-store {...
:local-document-store {:kv-store :local-rocksdb}}
:crux/index-store {:kv-store :local-rocksdb}}
If you’re using Kafka for both the transaction log and the document store, you can share connection config between them:
{
"kafka-config": {
"crux/module": "crux.kafka/->kafka-config",
"bootstrap-servers": "localhost:9092",
...
},
"crux/tx-log": {
"crux/module": "crux.kafka/->tx-log",
"kafka-config": "kafka-config",
...
}
"crux/document-store": {
"crux/module": "crux.kafka/->document-store",
"kafka-config": "kafka-config",
...
}
}
{:kafka-config {:crux/module 'crux.kafka/->kafka-config
:bootstrap-servers "localhost:9092"
...}
:crux/tx-log {:crux/module 'crux.kafka/->tx-log
:kafka-config :kafka-config
...}
:crux/document-store {:crux/module 'crux.kafka/->document-store
:kafka-config :kafka-config
...}}
{:kafka-config {:crux/module crux.kafka/->kafka-config
:bootstrap-servers "localhost:9092"
...}
:crux/tx-log {:crux/module crux.kafka/->tx-log
:kafka-config :kafka-config
...}
:crux/document-store {:crux/module crux.kafka/->document-store
:kafka-config :kafka-config
...}}
crux.kafka/->kafka-config
)tx-topic-opts
(topic options)
bootstrap-servers
(string, default "localhost:9092"
): URL for connecting to Kafka
properties-file
(string/File
/Path
): Kafka connection properties file, supplied directly to Kafka
properties-map
(map): Kafka connection properties map, supplied directly to Kafka
crux.kafka/->topic-opts
)topic-name
(string, required, default "tx-topic"
for tx-log, "doc-topic"
for document-store)
num-partitions
(int, default 1)
replication-factor
(int, default 1): level of durability for Kafka
create-topics?
(boolean, default true): whether to create topics if they do not exist
topic-config
(map): any further topic config to pass directly to Kafka
crux.kafka/->tx-log
)kafka-config
(connection config)
tx-topic-opts
(topic options)
poll-wait-duration
(string/Duration
, default 1 second, "PT1S"
): time to wait on each Kafka poll.
poll-sleep-duration
(string/Duration
, default 1 second, "PT1S"
): time to sleep between each poll, if the previous poll didn’t yield any transactions.
crux.kafka/->ingest-only-tx-log
)kafka-config
(connection config)
tx-topic-opts
(topic options)
crux.kafka/->document-store
)kafka-config
(connection config)
doc-topic-opts
(topic options)
local-document-store
(document store, default local in-memory kv-store)
poll-wait-duration
(string/Duration
, default 1 second, "PT1S"
): time to wait on each Kafka poll.
poll-sleep-duration
(string/Duration
, default 1 second, "PT1S"
): time to sleep between each poll, if the previous poll didn’t yield any transactions.
crux.kafka/->ingest-only-document-store
)kafka-config
(connection config)
tx-topic-opts
(topic options)
Can you improve this documentation? These fine people already did:
James Henderson, Daniel Mason, Jeremy Taylor & Dan MasonEdit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close