Clojure library to get the data from Elasticsearch as a lazy sequence. Following strategies are supported:
search_after
The scroll API is the default strategy.
The purpose of the library is to have an interface the consume all or some part the data from Elasticsearch. Why would you need to do that:
index.max_result_window
;The library is uploaded to Clojars, so you can just:
{:deps {lazy-elasticsearch-scroll {:mvn/version "1.0.11"}}}
If you want to use the code straight from Github then:
{:deps {lazy-elasticsearch-scroll {:git/url "https://github.com/dainiusjocas/lazy-elasticsearch-scroll.git"
:sha "396373ef6c37b0cba9576b66caf2c6133a03cf84"}}}
(require '[scroll :as scroll])
(scroll/hits
{:es-host "http://localhost:9200"
:index-name ".kibana"
:query {:query {:match_all {}}}})
;; =>
({:_id "space:default",
:_type "_doc",
:_score 1.0,
:_index ".kibana_1",
:_source {:space {:description "This is your default space!",
:color "#00bfb3",
:name "Default",
:_reserved true,
:disabledFeatures []},
:migrationVersion {:space "6.6.0"},
:type "space",
:references [],
:updated_at "2020-02-12T14:16:18.621Z"}}
{:_id "config:7.6.0",
:_type "_doc",
:_score 1.0,
:_index ".kibana_1",
:_source {:config {:buildNum 29000}, :type "config", :references [], :updated_at "2020-02-12T14:16:20.526Z"}})
;; Scroll through all the documents:
(scroll/hits {:es-host "http://localhost:9200"})
;; Fetch at most 10 docs:
(take 10 (scroll/hits
{:es-host "http://localhost:9200"
:index-name ".kibana"
:query {:query {:match_all {}}}}))
;; Do not keywordize keys
(scroll/hits
{:es-host "http://localhost:9200"
:opts {:keywordize? false}})
;; =>
({"_score" nil,
"_type" "_doc",
"sort" [0],
"_source" {"space" {"disabledFeatures" [],
"name" "Default",
"_reserved" true,
"color" "#00bfb3",
"description" "This is your default space!"},
"references" [],
"updated_at" "2020-02-12T14:16:18.621Z",
"type" "space",
"migrationVersion" {"space" "6.6.0"}},
"_id" "space:default",
"_index" ".kibana_1"}
{"_score" nil, "_type" "_doc", "sort" [0], "_source" {"value" 0}, "_id" "0", "_index" "scroll-test-index"})
To specify strategy you need to pass one of the following keys in the opts map: [:scroll-api :search-after]
. For the scroll API:
(scroll/hits
{:es-host "http://localhost:9200"
:opts {:strategy :scroll-api}})
For the search_after
:
(scroll/hits
{:es-host "http://localhost:9200"
:opts {:strategy :search-after}})
The scroll API is the default choice because it is the most common and relatively convenient way of getting documents from Elasticsearch. However, it has several disadvantages:
The search-after
strategy has several nice benefits:
search_after
under the hood is filtering, filters can be cached, so it is reasonably fast;However search-after
is not a silver bullet:
_id
is resource intensive, therefore you might get timeouts;_doc
is unpredictable because _doc is unique per shard;The basic authorization is supported via environment variables:
ELASTIC_USERNAME
, no default valueELASTIC_PASSWORD
, no default valueRun the development environment make run-dev-env
. This will start a docker-compose
cluster with Elasticsearch
and Kibana on exposed ports 9200
and 5601
respectively.
To run development environment with a specific ELK version:
(export ES_VERSION=6.8.8 && make run-dev-env)
Run integration tests locally make run-integration-tests
. This will start a docker-compose
in which the integration
tests will be run.
To run integration tests with a specific ELK version:
(export ES_VERSION=6.8.8 && make run-integration-tests)
Copyright © 2020 Dainius Jocas.
Distributed under the The Apache License, Version 2.0.
Can you improve this documentation?Edit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close