A simple Clojure library designed to facilitate easier integration w/ spark. Contains handy utilities, wrappers, functional patterns, etc. This is NOT a fully featured clojure DSL. Some features include easy schema configuration, data loading and UDF generation.
(ns example.schema
(:gen-class))
;; Schema definition; Use with dataframe loaders to set schema appropriately.
(def colspec
(list
{:name "foo" :type :int }
{:name "bar" :type :long }
{:name "baz" :type [:array :int] }))
(ns example.spark
(:require [sparq-yoots.configuration.core :as sparq.conf])
(:gen-class))
(defn run
[spark-context ...]
...)
(defn -main
[& args]
(let [spark-context (sparq.conf/spark-context conf
:app-name (parse-app-name args)
:master (parse-master args)
:spark-confs (parse-spark-confs args))]
(run spark-context ...)))
...
(defn -main
[& args]
(let [spark-session (sparq.conf/spark-session :app-name (parse-app-name args)
:master (parse-master ags)
:spark-confs (parse-spark-confs args)
:with-hive true)]
(run spark-session ...)))
(ns example.s3
(:require [sparq-yoots.configuration.s3 :as sparq.s3])
(:import [com.amazonaws.auth DefaultAWSCredentialsProviderChain])
(:gen-class))
(defn configure-s3
[ctx]
(let [creds (.getCredentials (DefaultAWSCredentialsProviderChain.))]
(sparq.s3/configure ctx creds)))
(defn -main
[& args]
(let [spark-context (...)]
(configure-s3 spark-context)
(run ...)))
(ns example.driver
(:requre [sparq-yoots.core :as sparq.core]
[example.schema :as schema]
[example.spark :as spark.conf])
(:gen-class))
(let [df (sparq.core/load-dataframe spark-ctx path schema/colspec)]
(run df))
(ns examples.functions
(:import [sparq_yoots.functions UDF3 UDF5 UDF7])
(:gen-class))
;; Create UDF3
(def foo (UDF3. (fn ^DoubleType [^DoubleType a ^DoubleType b ^DoubleType c] (* a b c))))
;;
(ns examples.driver
(:require [examples.functions :as func]
[sparq-yoots.sql.core :as sparq.sql])
(:gen-class))
(sparq.sql/register-function sql-ctx "foo" func/foo DataTypes/DoubleType)
...
Use gen-col
macro for creating named column functions.
gen-col
macro
(gen-col "col-1" "col_1")
(gen-col "tmp-col" "_temp_col")
(col-1) ;; "col_1"
(col-1 :field "foo") ;; "col_1.foo"
(col-1 :alias :index 0 :as "foo") ;; "a.col_1[0] AS foo"
(col-1 :cast "int") ;; "CAST(col_1 AS int)"
Convenience functions during UDF processing.
(def FOO (partial sparq.sql/double-col 0))
(def BAR (partial sparq.sql/bool-col 1))
(def BAZ (partial sparq.sql/int-col 2))
Copyright © 2019 Navil Charles
Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.
Can you improve this documentation? These fine people already did:
Navil Charles & nfcharlesEdit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close