Liking cljdoc? Tell your friends :D

Kafka Nodes

When using Crux at scale it is recommended to use multiple Crux nodes connected via a Kafka cluster. Use of multiple nodes provides availability and Kafka itself provides strong fault-tolerance guarantees. Kafka can be used for Crux’s transaction log and document store components.

Local Cluster Mode

Kafka’s document store requires a copy of the documents kept locally for random access - these can be stored in a KV store like RocksDB or LMDB.

For this reason, unless you want to keep both transactions and documents on Kafka (for simplicity, say, or historical reasons), we’d now recommend a different document store implementation - JDBC or S3, for example.

(The Kafka transaction log does not have this requirement)

Project Dependencies

deps.edn
juxt/crux-kafka {:mvn/version "{crux_version}-beta"}
pom.xml
<dependency>
    <groupId>juxt</groupId>
    <artifactId>crux-kafka</artifactId>
    <version>{crux_version}-beta</version>
</dependency>

Example configuration

Kafka as a Transaction Log

JSON
{
  "crux/tx-log": {
    "crux/module": "crux.kafka/->tx-log",
    "kafka-config": {
      "bootstrap-servers": "localhost:9092",
      ...
    },

    "tx-topic-opts": {
      "topic-name": "crux-transaction-log",
      ...
    },

    "poll-wait-duration": "PT1S"
  },

  ...
}
Clojure
{:crux/tx-log {:crux/module 'crux.kafka/->tx-log
               :kafka-config {:bootstrap-servers "localhost:9092"}
               :tx-topic-opts {:topic-name "crux-transaction-log"}
               :poll-wait-duration (Duration/ofSeconds 1)}
 ...}
EDN
{:crux/tx-log {:crux/module crux.kafka/->tx-log
               :kafka-config {:bootstrap-servers "localhost:9092"}
               :tx-topic-opts {:topic-name "crux-transaction-log"}
               :poll-wait-duration "PT1S"}
 ...}

If you do not want the local node to index transactions, you can use the crux.kafka/->ingest-only-tx-log module.

Kafka as a Document Store

JSON
{
  "crux/document-store": {
    "crux/module": "crux.kafka/->document-store",
    "kafka-config": {
      "bootstrap-servers": "localhost:9092",
      ...
    },
    "doc-topic-opts": {
      "topic-name": "crux-docs",
      ...
    },
    "local-document-store": {
      "kv-store": {
        "crux/module": "crux.rocksdb/->kv-store",
        "db-dir": "/tmp/rocksdb"
      }
    },
    "poll-wait-duration": "PT1S"
  },

  ...
}
Clojure
{:crux/document-store {:crux/module 'crux.kafka/->document-store
                       :kafka-config {:bootstrap-servers "localhost:9092"
                                      ...}
                       :doc-topic-opts {:topic-name "crux-docs"
                                        ...}
                       :local-document-store {:kv-store {:crux/module 'crux.rocksdb/->kv-store
                                                         :db-dir (io/file "/tmp/rocksdb")}}
                       :poll-wait-duration (Duration/ofSeconds 1)}
 ...}
EDN
{:crux/document-store {:crux/module crux.kafka/->document-store
                       :kafka-config {:bootstrap-servers "localhost:9092"
                                      ...}
                       :doc-topic-opts {:topic-name "crux-docs"
                                        ...}
                       :local-document-store {:kv-store {:crux/module crux.rocksdb/->kv-store
                                                         :db-dir "/tmp/rocksdb"}}
                       :poll-wait-duration "PT1S"}
 ...}

If you do not want the local node to index transactions, you can use the crux.kafka/->ingest-only-document-store module.

Sharing the local KV store

You can use the same local document store as the query indices, as follows:

JSON
{
  "local-rocksdb": {
    "crux/module": "crux.rocksdb/->kv-store",
    "db-dir": "/tmp/rocksdb"
  },

  "crux/document-store": {
    ...
    "local-document-store": {
      "kv-store": "local-rocksdb"
    }
  },

  "crux/index-store": {
    "kv-store": "local-rocksdb"
  }

  ...
}
Clojure
{...
 :local-rocksdb {:crux/module 'crux.rocksdb/->kv-store
                 :db-dir (io/file "/tmp/rocksdb")}
 :crux/document-store {...
                       :local-document-store {:kv-store :local-rocksdb}}
 :crux/index-store {:kv-store :local-rocksdb}}
EDN
{...
 :local-rocksdb {:crux/module crux.rocksdb/->kv-store
                 :db-dir "/tmp/rocksdb"}
 :crux/document-store {...
                       :local-document-store {:kv-store :local-rocksdb}}
 :crux/index-store {:kv-store :local-rocksdb}}

Sharing connection config between the transaction log and the document store

If you’re using Kafka for both the transaction log and the document store, you can share connection config between them:

JSON
{
  "kafka-config": {
    "crux/module": "crux.kafka/->kafka-config",
    "bootstrap-servers": "localhost:9092",
    ...
  },

  "crux/tx-log": {
    "crux/module": "crux.kafka/->tx-log",
    "kafka-config": "kafka-config",
    ...
  }

  "crux/document-store": {
    "crux/module": "crux.kafka/->document-store",
    "kafka-config": "kafka-config",
    ...
  }
}
Clojure
{:kafka-config {:crux/module 'crux.kafka/->kafka-config
                :bootstrap-servers "localhost:9092"
                ...}
 :crux/tx-log {:crux/module 'crux.kafka/->tx-log
               :kafka-config :kafka-config
               ...}
 :crux/document-store {:crux/module 'crux.kafka/->document-store
                       :kafka-config :kafka-config
                       ...}}
EDN
{:kafka-config {:crux/module crux.kafka/->kafka-config
                :bootstrap-servers "localhost:9092"
                ...}
 :crux/tx-log {:crux/module crux.kafka/->tx-log
               :kafka-config :kafka-config
               ...}
 :crux/document-store {:crux/module crux.kafka/->document-store
                       :kafka-config :kafka-config
                       ...}}

Parameters

Connection config (crux.kafka/->kafka-config)

  • tx-topic-opts (topic options)

  • bootstrap-servers (string, default "localhost:9092"): URL for connecting to Kafka

  • properties-file (string/File/Path): Kafka connection properties file, supplied directly to Kafka

  • properties-map (map): Kafka connection properties map, supplied directly to Kafka

Topic options (crux.kafka/->topic-opts)

  • topic-name (string, required, default "tx-topic" for tx-log, "doc-topic" for document-store)

  • num-partitions (int, default 1)

  • replication-factor (int, default 1): level of durability for Kafka

  • create-topics? (boolean, default true): whether to create topics if they do not exist

  • topic-config (map): any further topic config to pass directly to Kafka

Transaction log (crux.kafka/->tx-log)

  • kafka-config (connection config)

  • tx-topic-opts (topic options)

  • poll-wait-duration (string/Duration, default 1 second, "PT1S"): time to wait on each Kafka poll.

  • poll-sleep-duration (string/Duration, default 1 second, "PT1S"): time to sleep between each poll, if the previous poll didn’t yield any transactions.

Ingest-only transaction log (crux.kafka/->ingest-only-tx-log)

  • kafka-config (connection config)

  • tx-topic-opts (topic options)

Document store (crux.kafka/->document-store)

  • kafka-config (connection config)

  • doc-topic-opts (topic options)

  • local-document-store (document store, default local in-memory kv-store)

  • poll-wait-duration (string/Duration, default 1 second, "PT1S"): time to wait on each Kafka poll.

  • poll-sleep-duration (string/Duration, default 1 second, "PT1S"): time to sleep between each poll, if the previous poll didn’t yield any transactions.

Ingest-only document store (crux.kafka/->ingest-only-document-store)

  • kafka-config (connection config)

  • tx-topic-opts (topic options)

Can you improve this documentation? These fine people already did:
James Henderson, Daniel Mason, Jeremy Taylor & Dan Mason
Edit on GitHub

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close