Liking cljdoc? Tell your friends :D

Where's The Spark Session?

"The entry point into all functionality in Spark is the SparkSession class."
Spark's Official Getting Started

Most Geni functions for dataset creation (including reading data from different sources) use a Spark session in the background. For instance, it is optional to pass a Spark session to the function g/read-csv! as the first argument. When a Spark session is not present, Geni uses the default Spark session that can be found here. The default is designed to optimise for the out-of-the-box experience.

Note that the default Spark session is a delayed object that never gets instantiated unless invoked by these dataset-creation functions.

Creating A Spark Session

The following Scala Spark code:

import org.apache.spark.sql.SparkSession

val spark = SparkSession
  .builder()
  .master("local")
  .appName("Basic Spark App")
  .config("spark.some.config.option", "some-value")
  .getOrCreate()

translates to:

(require '[zero-one.geni.core :as g])

(g/create-spark-session
  {:master   "local"
   :app-name "Basic Spark App"
   :configs  {:spark.some.config.option "some-value"}})

It is also possible to specify :log-level and :checkpoint-dir, which are set at the SparkContext level. By default, Spark sets the log-level to INFO. In contrast, Geni sets it to WARN for a less verbose default REPL experience.

Can you improve this documentation? These fine people already did:
Burin Choomnuan & Anthony Khong
Edit on GitHub

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close