Apache Beam and Google Cloud Dataflow on ~~steroids~~ Clojure. The walkthrough explains everything.
This is alpha software. Always use latest version & watch release notes carefully. API subject to mood swings.
cd
into this repository.lein repl
(ns try-thurber
(:require [thurber :as th]
[thurber.sugar :refer :all]))
(->
(th/create-pipeline)
(th/apply!
(read-text-file
"demo/word_count/lorem.txt")
(th/fn* extract-words [sentence]
(remove empty? (.split sentence "[^\\p{L}]+")))
(count-per-element)
(th/fn* format-as-text
[[k v]] (format "%s: %d" k v))
(log-sink))
(th/run-pipeline!))
Output:
...
INFO thurber - extremely: 1
INFO thurber - undertakes: 1
INFO thurber - pleasure: 7
INFO thurber - you: 2
...
Each namespace in the demo/
source directory is a pipeline written in Clojure
using thurber. Comments in the source highlight salient aspects of thurber usage.
Along with the code walkthrough these are the best way to learn thurber's API and serve as recipes for various scenarios (use of tags, side inputs, windowing, combining, Beam's State API, etc etc.)
To execute a demo, start a REPL and evaluate (demo!)
from within the respective namespace.
The word_count
package contains ports of Beam's
Word Count Examples
to Clojure/thurber.
Beam's Mobile Gaming Examples have been ported to Clojure using thurber.
These are fully functional ports. They require deployment to GCP Dataflow:
First make your pipeline work. Then, optionally or as required, optimize:
aget
is explicitly overloaded for primitive arrays and type hinting primitive and Object arrays
may be essential to get the optimal invocation.NOTE aggressive optimizations may buy you some bottom-line cost improvements. Beam achieves linear scalability and often the slight overhead of clean Clojure code trumps the cost-savings of an aggressive optimization. Aggressive optimizations of Beam jobs is more likely to save you bottom-line resource cost over throughput/latency so keep this in mind as you prioritize dev/testing cost & effort of aggressive tuning efforts.
Copyright © 2020 Aaron Dixon
Like Clojure distributed under the Eclipse Public License.
Can you improve this documentation?Edit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close