Sparkling is a Clojure API for Apache Spark.
(do
(require '[sparkling.conf :as conf])
(require '[sparkling.core :as spark])
(spark/with-context sc (-> (conf/spark-conf) ; this creates a spark context from the given context
(conf/app-name "sparkling-test")
(conf/master "local"))
(let [lines-rdd (spark/into-rdd sc ["This is the first line" ;; here we provide data from a clojure collection.
"Testing spark" ;; You could also read from a text file, or avro file.
"and sparkling" ;; You could even approach a JDBC datasource
"Happy hacking!"])]
(spark/collect ;; get every element from the filtered RDD
(spark/filter ;; filter elements in the given RDD (lines-rdd)
#(.contains % "spark") ;; a pure clojure function as filter predicate
lines-rdd)))))
Check out our site for information about Gorillalabs Sparkling and a getting started guide.
Just clone our getting-started repo and get going right now.
But note: There's one thing you need to be aware of: Certain namespaces need to be AOT-compiled, e.g. because the classes are referenced in the startup process by name. I'm doing this in my project.clj using the :aot
directive like this
:aot [#".*" sparkling.serialization sparkling.destructuring]
Sparkling is available from Clojars. To use with Leiningen, add
whole-text-files
in sparkling.core.(thanks to Jase Bell)
Feel free to fork the Sparkling repository, improve stuff and open up a pull request against our "develop" branch. However, we'll only add features with tests, so make sure everything is green ;)
Thanks to The Climate Corporation and their open source clj-spark project, and to Yieldbot for yieldbot/flambo which served as the starting point for this project.
Copyright (C) 2014-2015 Dr. Christian Betz, and the Gorillalabs team.
Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.
Can you improve this documentation?Edit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close