Liking cljdoc? Tell your friends :D

zero-one.geni.core.dataset-creation


->schemaclj

(->schema value)

Coerces plain Clojure data structures to a Spark schema.

(-> {:x [:short]
     :y [:string :int]
     :z {:a :float :b :double}}
    g/->schema
    g/->string)
=> StructType(
     StructField(x,ArrayType(ShortType,true),true),
     StructField(y,MapType(StringType,IntegerType,true),true),
     StructField(
       z,
       StructType(
         StructField(a,FloatType,true),
         StructField(b,DoubleType,true)
       ),
       true
     )
   )
Coerces plain Clojure data structures to a Spark schema.

```clojure
(-> {:x [:short]
     :y [:string :int]
     :z {:a :float :b :double}}
    g/->schema
    g/->string)
=> StructType(
     StructField(x,ArrayType(ShortType,true),true),
     StructField(y,MapType(StringType,IntegerType,true),true),
     StructField(
       z,
       StructType(
         StructField(a,FloatType,true),
         StructField(b,DoubleType,true)
       ),
       true
     )
   )
```
sourceraw docstring

array-typeclj

(array-type val-type nullable)

Creates an ArrayType by specifying the data type of elements val-type and whether the array contains null values nullable.

Creates an ArrayType by specifying the data type of elements `val-type` and
whether the array contains null values `nullable`.
sourceraw docstring

create-dataframeclj

(create-dataframe rows schema)
(create-dataframe spark rows schema)

Params: (rdd: RDD[A])

(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[A])

Result: DataFrame

Creates a DataFrame from an RDD of Product (e.g. case classes, tuples).

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/SparkSession.html

Timestamp: 2020-10-19T01:56:50.125Z

Params: (rdd: RDD[A])

(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[A])

Result: DataFrame

Creates a DataFrame from an RDD of Product (e.g. case classes, tuples).


2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/SparkSession.html

Timestamp: 2020-10-19T01:56:50.125Z
sourceraw docstring

data-type->spark-typeclj

A mapping from type keywords to Spark types.

A mapping from type keywords to Spark types.
sourceraw docstring

java-type->spark-typeclj

A mapping from Java types to Spark types.

A mapping from Java types to Spark types.
sourceraw docstring

map->datasetclj

(map->dataset map-of-values)
(map->dataset spark map-of-values)

Construct a Dataset from an associative map.

(g/show (g/map->dataset {:a [1 2], :b [3 4]}))
; +---+---+
; |a  |b  |
; +---+---+
; |1  |3  |
; |2  |4  |
; +---+---+
Construct a Dataset from an associative map.

```clojure
(g/show (g/map->dataset {:a [1 2], :b [3 4]}))
; +---+---+
; |a  |b  |
; +---+---+
; |1  |3  |
; |2  |4  |
; +---+---+
```
sourceraw docstring

map-typeclj

(map-type key-type val-type)

Creates a MapType by specifying the data type of keys key-type, the data type of values val-type, and whether values contain any null value nullable.

Creates a MapType by specifying the data type of keys `key-type`, the data type
of values `val-type`, and whether values contain any null value `nullable`.
sourceraw docstring

rangecljmultimethod

Creates a Dataset with a single LongType column named id.

The Dataset contains elements in a range from start (default 0) to end (exclusive) with the given step (default 1).

If num-partitions is specified, the dataset will be distributed into the specified number of partitions. Otherwise, spark uses internal logic to determine the number of partitions.

Creates a `Dataset` with a single `LongType` column named `id`.

The `Dataset` contains elements in a range from `start` (default 0) to `end` (exclusive)
with the given `step` (default 1).

If `num-partitions` is specified, the dataset will be distributed into the specified number
of partitions. Otherwise, spark uses internal logic to determine the number of partitions.
sourceraw docstring

records->datasetclj

(records->dataset records)
(records->dataset spark records)

Construct a Dataset from a collection of maps.

(g/show (g/records->dataset [{:a 1 :b 2} {:a 3 :b 4}]))
; +---+---+
; |a  |b  |
; +---+---+
; |1  |2  |
; |3  |4  |
; +---+---+
Construct a Dataset from a collection of maps.

```clojure
(g/show (g/records->dataset [{:a 1 :b 2} {:a 3 :b 4}]))
; +---+---+
; |a  |b  |
; +---+---+
; |1  |2  |
; |3  |4  |
; +---+---+
```
sourceraw docstring

struct-fieldclj

(struct-field col-name data-type nullable)

Creates a StructField by specifying the name col-name, data type data-type and whether values of this field can be null values nullable.

Creates a StructField by specifying the name `col-name`, data type `data-type`
and whether values of this field can be null values `nullable`.
sourceraw docstring

struct-typeclj

(struct-type & fields)

Creates a StructType with the given list of StructFields fields.

Creates a StructType with the given list of StructFields `fields`.
sourceraw docstring

table->datasetclj

(table->dataset table col-names)
(table->dataset spark table col-names)

Construct a Dataset from a collection of collections.

(g/show (g/table->dataset [[1 2] [3 4]] [:a :b]))
; +---+---+
; |a  |b  |
; +---+---+
; |1  |2  |
; |3  |4  |
; +---+---+
Construct a Dataset from a collection of collections.

```clojure
(g/show (g/table->dataset [[1 2] [3 4]] [:a :b]))
; +---+---+
; |a  |b  |
; +---+---+
; |1  |2  |
; |3  |4  |
; +---+---+
```
sourceraw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close