"The entry point into all functionality in Spark is the SparkSession class."
Spark's Official Getting Started
Most Geni functions for dataset creation (including reading data from different sources) use a Spark session in the background. For instance, it is optional to pass a Spark session to the function g/read-csv! as the first argument. When a Spark session is not present, Geni uses the default Spark session that can be found here. The default is designed to optimise for the out-of-the-box experience.
Note that the default Spark session is a delayed object that never gets instantiated unless invoked by these dataset-creation functions.
The following Scala Spark code:
import org.apache.spark.sql.SparkSession
val spark = SparkSession
  .builder()
  .master("local")
  .appName("Basic Spark App")
  .config("spark.some.config.option", "some-value")
  .getOrCreate()
translates to:
(require '[zero-one.geni.core :as g])
(g/create-spark-session
  {:master   "local"
   :app-name "Basic Spark App"
   :configs  {:spark.some.config.option "some-value"}})
It is also possible to specify :log-level and :checkpoint-dir, which are set at the SparkContext level. By default, Spark sets the log-level to INFO. In contrast, Geni sets it to WARN for a less verbose default REPL experience.
Can you improve this documentation? These fine people already did:
Burin Choomnuan & Anthony KhongEdit on GitHub
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs | 
| ← | Move to previous article | 
| → | Move to next article | 
| Ctrl+/ | Jump to the search field |