Create a Dataset from a path or a collection of records.
Create a Dataset from a path or a collection of records.
(clip expr low high)Returns a new Column where values outside [low, high] are clipped to the interval edges.
Returns a new Column where values outside `[low, high]` are clipped to the interval edges.
(cut expr bins)Returns a new Column of discretised expr into the intervals of bins.
Returns a new Column of discretised `expr` into the intervals of bins.
(name-value-seq->dataset map-of-values)(name-value-seq->dataset spark map-of-values)Construct a Dataset from an associative map.
(g/show (g/map->dataset {:a [1 2], :b [3 4]}))
; +---+---+
; |a |b |
; +---+---+
; |1 |3 |
; |2 |4 |
; +---+---+
Construct a Dataset from an associative map.
```clojure
(g/show (g/map->dataset {:a [1 2], :b [3 4]}))
; +---+---+
; |a |b |
; +---+---+
; |1 |3 |
; |2 |4 |
; +---+---+
```(nlargest dataframe n-rows expr)Return the Dataset with the first n-rows rows ordered by expr in descending order.
Return the Dataset with the first `n-rows` rows ordered by `expr` in descending order.
(nsmallest dataframe n-rows expr)Return the Dataset with the first n-rows rows ordered by expr in ascending order.
Return the Dataset with the first `n-rows` rows ordered by `expr` in ascending order.
(nunique dataframe)Count distinct observations over all columns in the Dataset.
Count distinct observations over all columns in the Dataset.
(qcut expr num-buckets-or-probs)Returns a new Column of discretised expr into equal-sized buckets based
on rank or based on sample quantiles.
Returns a new Column of discretised `expr` into equal-sized buckets based on rank or based on sample quantiles.
(random-choice choices)(random-choice choices probs)(random-choice choices probs seed)Returns a new Column of a random sample from a given collection of choices.
Returns a new Column of a random sample from a given collection of `choices`.
(random-exp)(random-exp rate)(random-exp rate seed)Returns a new Column of draws from an exponential distribution.
Returns a new Column of draws from an exponential distribution.
(random-int)(random-int low high)(random-int low high seed)Returns a new Column of random integers from low (inclusive) to high (exclusive).
Returns a new Column of random integers from `low` (inclusive) to `high` (exclusive).
(random-norm)(random-norm mu sigma)(random-norm mu sigma seed)Returns a new Column of draws from a normal distribution.
Returns a new Column of draws from a normal distribution.
(random-uniform)(random-uniform low high)(random-uniform low high seed)Returns a new Column of draws from a uniform distribution.
Returns a new Column of draws from a uniform distribution.
(rchoice choices)(rchoice choices probs)(rchoice choices probs seed)Returns a new Column of a random sample from a given collection of choices.
Returns a new Column of a random sample from a given collection of `choices`.
(rexp)(rexp rate)(rexp rate seed)Returns a new Column of draws from an exponential distribution.
Returns a new Column of draws from an exponential distribution.
(rnorm)(rnorm mu sigma)(rnorm mu sigma seed)Returns a new Column of draws from a normal distribution.
Returns a new Column of draws from a normal distribution.
(runif)(runif low high)(runif low high seed)Returns a new Column of draws from a uniform distribution.
Returns a new Column of draws from a uniform distribution.
(runiform)(runiform low high)(runiform low high seed)Returns a new Column of draws from a uniform distribution.
Returns a new Column of draws from a uniform distribution.
(select-columns dataframe & exprs)Params: (cols: Column*)
Result: DataFrame
Selects a set of column based expressions.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.931Z
Params: (cols: Column*) Result: DataFrame Selects a set of column based expressions. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.931Z
(shape dataframe)Returns a vector representing the dimensionality of the Dataset.
Returns a vector representing the dimensionality of the Dataset.
(value-counts dataframe)Returns a Dataset containing counts of unique rows.
The resulting object will be in descending order so that the first element is the most frequently-occurring element.
Returns a Dataset containing counts of unique rows. The resulting object will be in descending order so that the first element is the most frequently-occurring element.
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |