Liking cljdoc? Tell your friends :D

thurber

Apache Beam and Google Cloud Dataflow on ~~steroids~~ Clojure.

This is alpha software. Bleeding-edge and all that. API subject to mood swings.

Principles
Quickstart
Guide
- Transforms
- Coders
Demos
- Word Count
- Mobile Gaming Example
Make It Fast

Principles

Enable Clojure
- Bring Clojure's powerful, expressive toolkit (destructuring, immutability, REPL, async tools, etc etc) to Apache Beam.
REPL Friendly
- Build and test your pipelines incrementally in the REPL.
- Learn Beam semantics (windowing, triggering) interactively.
Avoid Macros
- Limit macro infection. Most thurber constructions are macro-less.
No AOT
No Lock-in
- Pipelines can be composed of Clojure/thurber and Java transforms. Incrementally refactor your pipeline to Clojure or back to Java.
Not Afraid of Java Interop
- Wherever Clojure's Java Interop is performant and works cleanly with Beam, embrace it.
Completeness
- Support all Beam capabilities (Transforms, State & Timers, Side Inputs, Output Tags, etc.)
Performance
- Be finely tuned for data streaming

Quickstart

Clone & cd into this repository.
lein repl
Copy & paste:

(ns try-thurber
   (:require [thurber :as th]
             [clojure.string :as str])
   (:import (org.apache.beam.sdk.io TextIO)))

(defn- extract-words [sentence]
  (remove empty? (str/split sentence #"[^\p{L}]+")))

(.run
    (doto (th/create-pipeline)
      (th/apply!
        (-> (TextIO/read)
          (.from "demo/word_count/lorem.txt"))
        #'extract-words
        #'th/->kv
        (th/count-per-key)
        (th/inline 
          (fn format-as-text 
            [[k v]] (format "%s: %d" k v)))
        #'th/log-elem*)))

You should see streaming word counts:

...
INFO thurber - extremely: 1
INFO thurber - undertakes: 1
INFO thurber - pleasure: 7
INFO thurber - you: 2
...

Documentation

Code walkthrough

Demos

Each namespace in the demo/ source directory is a pipeline written in Clojure using thurber. Comments in the source highlight salient aspects of thurber usage.

These are the best way to learn thurber's API and serve as recipes for various scenarios (use of tags, side inputs, windowing, combining, Beam's State API, etc etc.)

To execute a demo, start a REPL and evaluate (demo!) from within the respective namespace.

Word Count

The word_count package contains ports of Beam's Word Count Examples to Clojure/thurber.

Mobile Gaming Example

Beam's Mobile Gaming Examples have been ported to Clojure using thurber.

These are fully functional ports but require deployment to GCP Dataflow. (How-to notes coming soon.)

Make It Fast

First make your pipeline work. Then make it fast.

Streaming/big data implies hot code paths.

Use Clojure type hints liberally.

If deploying to GCP, use Dataflow profiling to zero in on areas to optimize.

References

https://write.as/aaron-d/clojure-data-streaming-and-dodging-static-types

License

Like Clojure distributed under the Eclipse Public License.

Can you improve this documentation?Edit on GitHub

cljdoc is a website building & hosting documentation for Clojure/Script libraries

Keyboard shortcuts Report a problem cljdoc on GitHub

× close