"The entry point into all functionality in Spark is the SparkSession class."
Spark's Official Getting Started
Most Geni functions for dataset creation (including reading data from different sources) use a Spark session in the background. For instance, it is optional to pass a Spark session to the function g/read-csv! as the first argument. When a Spark session is not present, Geni uses the default Spark session that can be found here. The default is designed to optimise for the out-of-the-box experience.
Note that the default Spark session is a delayed object that never gets instantiated unless invoked by these dataset-creation functions.
The following Scala Spark code:
import org.apache.spark.sql.SparkSession
val spark = SparkSession
.builder()
.master("local")
.appName("Basic Spark App")
.config("spark.some.config.option", "some-value")
.getOrCreate()
translates to:
(require '[zero-one.geni.core :as g])
(g/create-spark-session
{:master "local"
:app-name "Basic Spark App"
:configs {:spark.some.config.option "some-value"}})
It is also possible to specify :log-level and :checkpoint-dir, which are set at the SparkContext level. By default, Spark sets the log-level to INFO. In contrast, Geni sets it to WARN for a less verbose default REPL experience.
Can you improve this documentation? These fine people already did:
Burin Choomnuan & Anthony KhongEdit on GitHub
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |