Apache Beam and Google Cloud Dataflow on ~~steroids~~ Clojure. The walkthrough explains everything.
This is alpha software. Prefer latest versions of thurber and Beam Java SDK & watch release notes carefully. API subject to mood swings.
cd
into this repository.lein repl
(ns try-thurber
(:require [thurber :as th]
[thurber.sugar :refer :all]))
(->
(th/create-pipeline)
(th/apply!
(read-text-file
"demo/word_count/lorem.txt")
(th/fn* extract-words [sentence]
(remove empty? (.split sentence "[^\\p{L}]+")))
(count-per-element)
(th/fn* format-as-text
[[k v]] (format "%s: %d" k v))
(log-sink))
(th/run-pipeline!))
Output:
...
INFO thurber - extremely: 1
INFO thurber - undertakes: 1
INFO thurber - pleasure: 7
INFO thurber - you: 2
...
Each namespace in the demo/
source directory is a pipeline written in Clojure
using thurber. Comments in the source highlight salient aspects of thurber usage.
Along with the code walkthrough these are the best way to learn thurber's API and serve as recipes for various scenarios (use of tags, side inputs, windowing, combining, Beam's State API, etc etc.)
To execute a demo, start a REPL and evaluate (demo!)
from within the respective namespace.
The word_count
package contains ports of Beam's
Word Count Examples
to Clojure/thurber.
Beam's Mobile Gaming Examples (documented here) have been ported to Clojure using thurber.
These are fully functional ports. They require deployment to GCP Dataflow:
Beam has many I/O transforms — see here.
KafkaIO, for example, has some configuration nuances:
If you need help using thurber/Clojure with another I/O transform, you can open an issue to request any thurber demo code you'd like to see.
Streaming/big data implies hot code paths. thurber's core has been tuned for performance in various ways, but you may benefit from tuning your own pipeline code:
aget
is explicitly overloaded for primitive arrays — type hinting is key here.Copyright © 2020 Aaron Dixon
Like Clojure distributed under the Eclipse Public License.
Can you improve this documentation?Edit on GitHub
cljdoc builds & hosts documentation for Clojure/Script libraries
Ctrl+k | Jump to recent docs |
← | Move to previous article |
→ | Move to next article |
Ctrl+/ | Jump to the search field |