Create a Dataset from a path or a collection of records.
Create a Dataset from a path or a collection of records.
(clip expr low high)
Returns a new Column where values outside [low, high]
are clipped to the interval edges.
Returns a new Column where values outside `[low, high]` are clipped to the interval edges.
(cut expr bins)
Returns a new Column of discretised expr
into the intervals of bins.
Returns a new Column of discretised `expr` into the intervals of bins.
(name-value-seq->dataset map-of-values)
(name-value-seq->dataset spark map-of-values)
Construct a Dataset from an associative map.
(g/show (g/map->dataset {:a [1 2], :b [3 4]}))
; +---+---+
; |a |b |
; +---+---+
; |1 |3 |
; |2 |4 |
; +---+---+
Construct a Dataset from an associative map. ```clojure (g/show (g/map->dataset {:a [1 2], :b [3 4]})) ; +---+---+ ; |a |b | ; +---+---+ ; |1 |3 | ; |2 |4 | ; +---+---+ ```
(nlargest dataframe n-rows expr)
Return the Dataset with the first n-rows
rows ordered by expr
in descending order.
Return the Dataset with the first `n-rows` rows ordered by `expr` in descending order.
(nsmallest dataframe n-rows expr)
Return the Dataset with the first n-rows
rows ordered by expr
in ascending order.
Return the Dataset with the first `n-rows` rows ordered by `expr` in ascending order.
(nunique dataframe)
Count distinct observations over all columns in the Dataset.
Count distinct observations over all columns in the Dataset.
(qcut expr num-buckets-or-probs)
Returns a new Column of discretised expr
into equal-sized buckets based
on rank or based on sample quantiles.
Returns a new Column of discretised `expr` into equal-sized buckets based on rank or based on sample quantiles.
(random-choice choices)
(random-choice choices probs)
(random-choice choices probs seed)
Returns a new Column of a random sample from a given collection of choices
.
Returns a new Column of a random sample from a given collection of `choices`.
(random-exp)
(random-exp rate)
(random-exp rate seed)
Returns a new Column of draws from an exponential distribution.
Returns a new Column of draws from an exponential distribution.
(random-int)
(random-int low high)
(random-int low high seed)
Returns a new Column of random integers from low
(inclusive) to high
(exclusive).
Returns a new Column of random integers from `low` (inclusive) to `high` (exclusive).
(random-norm)
(random-norm mu sigma)
(random-norm mu sigma seed)
Returns a new Column of draws from a normal distribution.
Returns a new Column of draws from a normal distribution.
(random-uniform)
(random-uniform low high)
(random-uniform low high seed)
Returns a new Column of draws from a uniform distribution.
Returns a new Column of draws from a uniform distribution.
(rchoice choices)
(rchoice choices probs)
(rchoice choices probs seed)
Returns a new Column of a random sample from a given collection of choices
.
Returns a new Column of a random sample from a given collection of `choices`.
(replace expr lookup-map)
(replace expr from-value-or-values to-value)
Returns a new Column where from-value-or-values
is replaced with to-value
.
Returns a new Column where `from-value-or-values` is replaced with `to-value`.
(rexp)
(rexp rate)
(rexp rate seed)
Returns a new Column of draws from an exponential distribution.
Returns a new Column of draws from an exponential distribution.
(rnorm)
(rnorm mu sigma)
(rnorm mu sigma seed)
Returns a new Column of draws from a normal distribution.
Returns a new Column of draws from a normal distribution.
(runif)
(runif low high)
(runif low high seed)
Returns a new Column of draws from a uniform distribution.
Returns a new Column of draws from a uniform distribution.
(runiform)
(runiform low high)
(runiform low high seed)
Returns a new Column of draws from a uniform distribution.
Returns a new Column of draws from a uniform distribution.
(select-columns dataframe & exprs)
Params: (cols: Column*)
Result: DataFrame
Selects a set of column based expressions.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.931Z
Params: (cols: Column*) Result: DataFrame Selects a set of column based expressions. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.931Z
(shape dataframe)
Returns a vector representing the dimensionality of the Dataset.
Returns a vector representing the dimensionality of the Dataset.
(value-counts dataframe)
Returns a Dataset containing counts of unique rows.
The resulting object will be in descending order so that the first element is the most frequently-occurring element.
Returns a Dataset containing counts of unique rows. The resulting object will be in descending order so that the first element is the most frequently-occurring element.
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close