zero-one.geni.core

Liking cljdoc? Tell your friends :D

Clojure only.

!
%
&
&&
*
**
+
-
->col-array
->column
->dataset
->date-col
->debug-string
->kebab-columns
->schema
->string
->timestamp-col
->utc-timestamp
/
<
<=
<=>
=
=!=
===
>
>=
abs
acos
add
add-months
agg
agg-all
aggregate
alias
app-name
approx-count-distinct
approx-quantile
array
array-contains
array-distinct
array-except
array-intersect
array-join
array-max
array-min
array-position
array-remove
array-repeat
array-sort
array-type
array-union
arrays-overlap
arrays-zip
as
asc
asc-nulls-first
asc-nulls-last
ascii
asin
assoc
atan
atan-2
atan2
base-64
base64
between
bin
binary-files
bit-size
bitwise-and
bitwise-not
bitwise-or
bitwise-xor
bloom-filter
boolean
broadcast
bround
byte
cache
case
cast
cbrt
ceil
checkpoint
checkpoint-dir
clip
coalesce
col
col-regex
collect
collect-col
collect-list
collect-set
collect-to-arrow
collect-vals
column-names
columns
compatible?
concat
concat-ws
cond
condp
conf
confidence
contains
conv
corr
cos
cosh
count
count-distinct
count-min-sketch
cov
covar
covar-pop
covar-samp
crc-32
crc32
create-dataframe
create-global-temp-view!
create-or-replace-global-temp-view!
create-or-replace-temp-view!
create-spark-session
create-temp-view!
cross-join
crosstab
cube
cube-root
cume-dist
current-date
current-timestamp
cut
date-add
date-diff
date-format
date-sub
date-trunc
datediff
day-of-month
day-of-week
day-of-year
dayofmonth
dayofweek
dayofyear
dec
decode
default-min-partitions
default-parallelism
degrees
dense
dense-rank
depth
desc
desc-nulls-first
desc-nulls-last
describe
disk-only
disk-only-2
dissoc
distinct
double
drop
drop-duplicates
drop-na
dtypes
element-at
empty?
encode
ends-with
estimate-count
even?
except
except-all
exists
exp
expected-fpp
explain
explode
expm-1
expm1
expr
factorial
fill-na
filter
first
first-vals
flatten
float
floor
forall
format-number
format-string
freq-items
from-csv
from-json
from-unixtime
get-checkpoint-dir
get-conf
get-field
get-item
get-local-property
get-persistent-rdds
get-spark-home
greatest
group-by
grouping
grouping-id
hash
hash-code
head
head-vals
hex
hint
hour
hypot
if
inc
initcap
input-file-name
input-files
instr
int
interquartile-range
intersect
intersect-all
iqr
is-compatible
is-empty
is-in-collection
is-local
is-nan
is-not-null
is-null
is-streaming
isin
jars
java-spark-context
join
join-with
keys
kurtosis
lag
last
last-day
last-vals
lead
least
length
levenshtein
like
limit
lit
local?
locate
log
log-10
log-1p
log-2
log10
log1p
log2
long
lower
lpad
ltrim
map
map->dataset
map-concat
map-entries
map-filter
map-from-arrays
map-from-entries
map-keys
map-type
map-values
map-zip-with
master
max
md-5
md5
mean
median
memory-and-disk
memory-and-disk-2
memory-and-disk-ser
memory-and-disk-ser-2
memory-only
memory-only-2
memory-only-ser
memory-only-ser-2
merge
merge-in-place
merge-with
might-contain
min
minute
mod
monotonically-increasing-id
month
months-between
name-value-seq->dataset
nan?
nanvl
neg?
negate
next-day
nlargest
none
not
not-null?
nsmallest
ntile
null-count
null-rate
null?
nunique
odd?
off-heap
order-by
over
overlay
partitions
percent-rank
persist
pi
pivot
pmod
pos?
posexplode
posexplode-outer
pow
print-schema
put
qcut
quantile
quarter
radians
rand
rand-nth
randn
random-choice
random-exp
random-int
random-norm
random-split
random-uniform
range
rank
rchoice
rdd
read-avro!
read-binary!
read-csv!
read-edn!
read-jdbc!
read-json!
read-libsvm!
read-parquet!
read-table!
read-text!
read-xlsx!
records->dataset
regexp-extract
regexp-replace
relative-error
remove
rename-columns
rename-keys
repartition
repartition-by-range
replace
replace-na
resources
reverse
rexp
rint
rlike
rnorm
rollup
round
row
row-number
rpad
rtrim
runif
runiform
sample
sample-by
sc
schema-of-csv
schema-of-json
second
select
select-columns
select-expr
select-keys
sequence
sha-1
sha-2
sha1
sha2
shape
shift-left
shift-right
shift-right-unsigned
short
show
show-vertical
shuffle
signum
sin
sinh
size
skewness
slice
sort
sort-array
sort-within-partitions
soundex
spark-conf
spark-context
spark-home
spark-partition-id
spark-session
sparse
split
sql
sql-context
sqr
sqrt
starts-with
std
stddev
stddev-pop
stddev-samp
storage-level
str
streaming?
struct
struct-field
struct-type
substring
substring-index
sum
sum-distinct
summary
table->dataset
tail
tail-vals
take
take-vals
tan
tanh
time-window
to-byte-array
to-csv
to-date
to-debug-string
to-df
to-json
to-string
to-timestamp
to-utc-timestamp
total-count
transform
transform-keys
transform-values
translate
trim
unbase-64
unbase64
unbounded-following
unbounded-preceding
unhex
union
union-by-name
unix-timestamp
unpersist
update
upper
vals
value-counts
var-pop
var-samp
variance
version
week-of-year
weekofyear
when
where
width
window
windowed
with-column
with-column-renamed
write-avro!
write-csv!
write-edn!
write-jdbc!
write-json!
write-libsvm!
write-parquet!
write-table!
write-text!
write-xlsx!
xxhash-64
xxhash64
year
zero?
zip-with
zipmap
|
||

!^clj

(! expr)

Params: (e: Column)

Result: Column

Inversion of boolean expression, i.e. NOT.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.497Z

Params: (e: Column)

Result: Column

Inversion of boolean expression, i.e. NOT.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.497Z

raw docstring

%^clj

(% left-expr right-expr)

Params: (other: Any)

Result: Column

Modulo (a.k.a. remainder) expression.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.822Z

Params: (other: Any)

Result: Column

Modulo (a.k.a. remainder) expression.


1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.822Z

raw docstring

&^clj

(& left-expr right-expr)

Params: (other: Any)

Result: Column

Compute bitwise AND of this expression with another expression.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.878Z

Params: (other: Any)

Result: Column

Compute bitwise AND of this expression with another expression.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.878Z

raw docstring

&&^clj

(&& & exprs)

Params: (other: Any)

Result: Column

Boolean AND.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.824Z

Params: (other: Any)

Result: Column

Boolean AND.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.824Z

raw docstring

*^clj

(* & exprs)

Params: (other: Any)

Result: Column

Multiplication of this expression and another expression.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.827Z

Params: (other: Any)

Result: Column

Multiplication of this expression and another expression.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.827Z

raw docstring

**^clj

(** base exponent)

Params: (l: Column, r: Column)

Result: Column

Returns the value of the first argument raised to the power of the second argument.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.520Z

Params: (l: Column, r: Column)

Result: Column

Returns the value of the first argument raised to the power of the second argument.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.520Z

raw docstring

+^clj

(+ & exprs)

Params: (other: Any)

Result: Column

Sum of this expression and another expression.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.829Z

Params: (other: Any)

Result: Column

Sum of this expression and another expression.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.829Z

raw docstring

-^clj

(- & exprs)

Params: (other: Any)

Result: Column

Subtraction. Subtract the other expression from this expression.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.957Z

Params: (other: Any)

Result: Column

Subtraction. Subtract the other expression from this expression.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.957Z

raw docstring

->col-array^clj

(->col-array args)

Coerce a coll of coerceable values into a coll of columns.

Coerce a coll of coerceable values into a coll of columns.

raw docstring

->column^cljmultimethod

Params: (colName: String)

Result: Column

Returns a Column based on the given column name.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.258Z

Params: (colName: String)

Result: Column

Returns a Column based on the given column name.


1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.258Z

raw docstring

->dataset^cljmultimethod

Create a Dataset from a path or a collection of records.

Create a Dataset from a path or a collection of records.

raw docstring

->date-col^clj

(->date-col expr)

(->date-col expr date-format)

Params: (e: Column)

Result: Column

Converts the column into DateType by casting rules to DateType.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.616Z

Params: (e: Column)

Result: Column

Converts the column into DateType by casting rules to DateType.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.616Z

raw docstring

->debug-string^clj

Coerce to string useful for debugging.

Coerce to string useful for debugging.

raw docstring

->kebab-columns^clj

(->kebab-columns dataset)

Returns a new Dataset with all columns renamed to kebab cases.

Returns a new Dataset with all columns renamed to kebab cases.

raw docstring

->schema^clj

(->schema value)

Coerces plain Clojure data structures to a Spark schema.

(-> {:x [:short]
     :y [:string :int]
     :z {:a :float :b :double}}
    g/->schema
    g/->string)
=> StructType(
     StructField(x,ArrayType(ShortType,true),true),
     StructField(y,MapType(StringType,IntegerType,true),true),
     StructField(
       z,
       StructType(
         StructField(a,FloatType,true),
         StructField(b,DoubleType,true)
       ),
       true
     )
   )

Coerces plain Clojure data structures to a Spark schema.

```clojure
(-> {:x [:short]
     :y [:string :int]
     :z {:a :float :b :double}}
    g/->schema
    g/->string)
=> StructType(
     StructField(x,ArrayType(ShortType,true),true),
     StructField(y,MapType(StringType,IntegerType,true),true),
     StructField(
       z,
       StructType(
         StructField(a,FloatType,true),
         StructField(b,DoubleType,true)
       ),
       true
     )
   )
```

raw docstring

->string^clj

Coerce to string.

Coerce to string.

raw docstring

->timestamp-col^clj

(->timestamp-col expr)

(->timestamp-col expr date-format)

Params: (s: Column)

Result: Column

Converts to a timestamp by casting rules to TimestampType.

A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A timestamp, or null if the input was a string that could not be cast to a timestamp

2.2.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.623Z

Params: (s: Column)

Result: Column

Converts to a timestamp by casting rules to TimestampType.


A date, timestamp or string. If a string, the data must be in a format that can be
         cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A timestamp, or null if the input was a string that could not be cast to a timestamp

2.2.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.623Z

raw docstring

->utc-timestamp^clj

(->utc-timestamp expr)

Params: (ts: Column, tz: String)

Result: Column

Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'.

A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A string detailing the time zone ID that the input should be adjusted to. It should be in the format of either region-based zone IDs or zone offsets. Region IDs must have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'. Other short names are not recommended to use because they can be ambiguous.

A timestamp, or null if ts was a string that could not be cast to a timestamp or tz was an invalid value

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.626Z

Params: (ts: Column, tz: String)

Result: Column

Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time
zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield
'2017-07-14 01:40:00.0'.


A date, timestamp or string. If a string, the data must be in a format that can be
          cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A string detailing the time zone ID that the input should be adjusted to. It should
          be in the format of either region-based zone IDs or zone offsets. Region IDs must
          have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in
          the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are
          supported as aliases of '+00:00'. Other short names are not recommended to use
          because they can be ambiguous.

A timestamp, or null if ts was a string that could not be cast to a timestamp or
        tz was an invalid value

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.626Z

raw docstring

/^clj

(/ & exprs)

Params: (other: Any)

Result: Column

Division this expression by another expression.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.832Z

Params: (other: Any)

Result: Column

Division this expression by another expression.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.832Z

raw docstring

<^clj

Params: (other: Any)

Result: Column

Less than.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.834Z

Params: (other: Any)

Result: Column

Less than.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.834Z

raw docstring

<=^clj

Params: (other: Any)

Result: Column

Less than or equal to.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.836Z

Params: (other: Any)

Result: Column

Less than or equal to.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.836Z

raw docstring

<=>^clj

Params: (other: Any)

Result: Column

Equality test that is safe for null values.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.838Z

Params: (other: Any)

Result: Column

Equality test that is safe for null values.


1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.838Z

raw docstring

=^clj

Params: (other: Any)

Result: Column

Equality test.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.843Z

Params: (other: Any)

Result: Column

Equality test.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.843Z

raw docstring

=!=^clj

Params: (other: Any)

Result: Column

Inequality test.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.840Z

Params: (other: Any)

Result: Column

Inequality test.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.840Z

raw docstring

===^clj

Params: (other: Any)

Result: Column

Equality test.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.843Z

Params: (other: Any)

Result: Column

Equality test.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.843Z

raw docstring

>^clj

Params: (other: Any)

Result: Column

Greater than.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.845Z

Params: (other: Any)

Result: Column

Greater than.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.845Z

raw docstring

>=^clj

Params: (other: Any)

Result: Column

Greater than or equal to an expression.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.847Z

Params: (other: Any)

Result: Column

Greater than or equal to an expression.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.847Z

raw docstring

abs^clj

(abs expr)

Params: (e: Column)

Result: Column

Computes the absolute value of a numeric value.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.169Z

Params: (e: Column)

Result: Column

Computes the absolute value of a numeric value.


1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.169Z

raw docstring

acos^clj

(acos expr)

Params: (e: Column)

Result: Column

inverse cosine of e in radians, as if computed by java.lang.Math.acos

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.171Z

Params: (e: Column)

Result: Column

inverse cosine of e in radians, as if computed by java.lang.Math.acos

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.171Z

raw docstring

add^clj

(add cms item)

(add cms item cnt)

Params: (item: Any)

Result: Unit

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/CountMinSketch.html

Timestamp: 2020-10-19T01:56:26.095Z

Params: (item: Any)

Result: Unit



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/CountMinSketch.html

Timestamp: 2020-10-19T01:56:26.095Z

raw docstring

add-months^clj

(add-months expr months)

Params: (startDate: Column, numMonths: Int)

Result: Column

Returns the date that is numMonths after startDate.

A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

The number of months to add to startDate, can be negative to subtract months

A date, or null if startDate was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.174Z

Params: (startDate: Column, numMonths: Int)

Result: Column

Returns the date that is numMonths after startDate.


A date, timestamp or string. If a string, the data must be in a format that
                 can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

The number of months to add to startDate, can be negative to subtract months

A date, or null if startDate was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.174Z

raw docstring

agg^clj

(agg dataframe & args)

Params: (aggExpr: (String, String), aggExprs: (String, String)*)

Result: DataFrame

(Scala-specific) Aggregates on the entire Dataset without groups.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.739Z

Params: (aggExpr: (String, String), aggExprs: (String, String)*)

Result: DataFrame

(Scala-specific) Aggregates on the entire Dataset without groups.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.739Z

raw docstring

agg-all^clj

(agg-all dataframe agg-fn)

Aggregates on all columns of the entire Dataset without groups.

Aggregates on all columns of the entire Dataset without groups.

raw docstring

aggregate^clj

(aggregate expr init merge-fn)

(aggregate expr init merge-fn finish-fn)

Params: (expr: Column, initialValue: Column, merge: (Column, Column) ⇒ Column, finish: (Column) ⇒ Column)

Result: Column

Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state. The final state is converted into the final result by applying a finish function.

the input array column

the initial value

(combined_value, input_value) => combined_value, the merge function to merge an input value to the combined_value

combined_value => final_value, the lambda function to convert the combined value of all inputs to final result

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.177Z

Params: (expr: Column, initialValue: Column, merge: (Column, Column) ⇒ Column, finish: (Column) ⇒ Column)

Result: Column

Applies a binary operator to an initial state and all elements in the array,
and reduces this to a single state. The final state is converted into the final result
by applying a finish function.

the input array column

the initial value

(combined_value, input_value) => combined_value, the merge function to merge
             an input value to the combined_value

combined_value => final_value, the lambda function to convert the combined value
              of all inputs to final result

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.177Z

raw docstring

alias^cljmultimethod

Column: Gives the column an alias.

Dataset: Returns a new Dataset with an alias set.

Column: Gives the column an alias.

Dataset: Returns a new Dataset with an alias set.

raw docstring

app-name^clj

(app-name)

(app-name spark)

Params:

Result: String

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.487Z

Params: 

Result: String



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.487Z

raw docstring

approx-count-distinct^clj

(approx-count-distinct expr)

(approx-count-distinct expr rsd)

Params: (e: Column)

Result: Column

(Since version 2.1.0) Use approx_count_distinct

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.742Z

Params: (e: Column)

Result: Column

(Since version 2.1.0) Use approx_count_distinct

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.742Z

raw docstring

approx-quantile^clj

(approx-quantile dataframe col-or-cols probs rel-error)

Params: (col: String, probabilities: Array[Double], relativeError: Double)

Result: Array[Double]

Calculates the approximate quantiles of a numerical column of a DataFrame.

The result of this algorithm has the following deterministic bound: If the DataFrame has N elements and if we request the quantile at probability p up to error err, then the algorithm will return a sample x from the DataFrame so that the exact rank of x is close to (p * N). More precisely,

This method implements a variation of the Greenwald-Khanna algorithm (with some speed optimizations). The algorithm was first present in Space-efficient Online Computation of Quantile Summaries by Greenwald and Khanna.

the name of the numerical column

a list of quantile probabilities Each number must belong to [0, 1]. For example 0 is the minimum, 0.5 is the median, 1 is the maximum.

The relative target precision to achieve (greater than or equal to 0). If set to zero, the exact quantiles are computed, which could be very expensive. Note that values greater than 1 are accepted but give the same result as 1.

the approximate quantiles at the given probabilities

2.0.0

null and NaN values will be removed from the numerical column before calculation. If the dataframe is empty or the column only contains null or NaN, an empty array is returned.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameStatFunctions.html

Timestamp: 2020-10-19T01:56:24.640Z

Params: (col: String, probabilities: Array[Double], relativeError: Double)

Result: Array[Double]

Calculates the approximate quantiles of a numerical column of a DataFrame.

The result of this algorithm has the following deterministic bound:
If the DataFrame has N elements and if we request the quantile at probability p up to error
err, then the algorithm will return a sample x from the DataFrame so that the *exact* rank
of x is close to (p * N).
More precisely,

This method implements a variation of the Greenwald-Khanna algorithm (with some speed
optimizations).
The algorithm was first present in 
Space-efficient Online Computation of Quantile Summaries by Greenwald and Khanna.


the name of the numerical column

a list of quantile probabilities
  Each number must belong to [0, 1].
  For example 0 is the minimum, 0.5 is the median, 1 is the maximum.

The relative target precision to achieve (greater than or equal to 0).
  If set to zero, the exact quantiles are computed, which could be very expensive.
  Note that values greater than 1 are accepted but give the same result as 1.

the approximate quantiles at the given probabilities

2.0.0

null and NaN values will be removed from the numerical column before calculation. If
  the dataframe is empty or the column only contains null or NaN, an empty array is returned.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameStatFunctions.html

Timestamp: 2020-10-19T01:56:24.640Z

raw docstring

array^clj

(array & exprs)

Params: (cols: Column*)

Result: Column

Creates a new array column. The input columns must all have the same data type.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.184Z

Params: (cols: Column*)

Result: Column

Creates a new array column. The input columns must all have the same data type.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.184Z

raw docstring

array-contains^clj

(array-contains expr value)

Params: (column: Column, value: Any)

Result: Column

Returns null if the array is null, true if the array contains value, and false otherwise.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.185Z

Params: (column: Column, value: Any)

Result: Column

Returns null if the array is null, true if the array contains value, and false otherwise.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.185Z

raw docstring

array-distinct^clj

(array-distinct expr)

Params: (e: Column)

Result: Column

Removes duplicate values from the array.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.186Z

Params: (e: Column)

Result: Column

Removes duplicate values from the array.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.186Z

raw docstring

array-except^clj

(array-except left right)

Params: (col1: Column, col2: Column)

Result: Column

Returns an array of the elements in the first array but not in the second array, without duplicates. The order of elements in the result is not determined

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.188Z

Params: (col1: Column, col2: Column)

Result: Column

Returns an array of the elements in the first array but not in the second array,
without duplicates. The order of elements in the result is not determined


2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.188Z

raw docstring

array-intersect^clj

(array-intersect left right)

Params: (col1: Column, col2: Column)

Result: Column

Returns an array of the elements in the intersection of the given two arrays, without duplicates.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.189Z

Params: (col1: Column, col2: Column)

Result: Column

Returns an array of the elements in the intersection of the given two arrays,
without duplicates.


2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.189Z

raw docstring

array-join^clj

(array-join expr delimiter)

(array-join expr delimiter null-replacement)

Params: (column: Column, delimiter: String, nullReplacement: String)

Result: Column

Concatenates the elements of column using the delimiter. Null values are replaced with nullReplacement.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.194Z

Params: (column: Column, delimiter: String, nullReplacement: String)

Result: Column

Concatenates the elements of column using the delimiter. Null values are replaced with
nullReplacement.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.194Z

raw docstring

array-max^clj

(array-max expr)

Params: (e: Column)

Result: Column

Returns the maximum value in the array.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.195Z

Params: (e: Column)

Result: Column

Returns the maximum value in the array.


2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.195Z

raw docstring

array-min^clj

(array-min expr)

Params: (e: Column)

Result: Column

Returns the minimum value in the array.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.197Z

Params: (e: Column)

Result: Column

Returns the minimum value in the array.


2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.197Z

raw docstring

array-position^clj

(array-position expr value)

Params: (column: Column, value: Any)

Result: Column

Locates the position of the first occurrence of the value in the given array as long. Returns null if either of the arguments are null.

2.4.0

The position is not zero based, but 1 based index. Returns 0 if value could not be found in array.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.198Z

Params: (column: Column, value: Any)

Result: Column

Locates the position of the first occurrence of the value in the given array as long.
Returns null if either of the arguments are null.


2.4.0

The position is not zero based, but 1 based index. Returns 0 if value
could not be found in array.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.198Z

raw docstring

array-remove^clj

(array-remove expr element)

Params: (column: Column, element: Any)

Result: Column

Remove all elements that equal to element from the given array.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.199Z

Params: (column: Column, element: Any)

Result: Column

Remove all elements that equal to element from the given array.


2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.199Z

raw docstring

array-repeat^clj

(array-repeat left right)

Params: (left: Column, right: Column)

Result: Column

Creates an array containing the left argument repeated the number of times given by the right argument.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.201Z

Params: (left: Column, right: Column)

Result: Column

Creates an array containing the left argument repeated the number of times given by the
right argument.


2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.201Z

raw docstring

array-sort^clj

(array-sort expr)

Params: (e: Column)

Result: Column

Sorts the input array in ascending order. The elements of the input array must be orderable. Null elements will be placed at the end of the returned array.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.202Z

Params: (e: Column)

Result: Column

Sorts the input array in ascending order. The elements of the input array must be orderable.
Null elements will be placed at the end of the returned array.


2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.202Z

raw docstring

array-type^clj

(array-type val-type nullable)

Creates an ArrayType by specifying the data type of elements val-type and whether the array contains null values nullable.

Creates an ArrayType by specifying the data type of elements `val-type` and
whether the array contains null values `nullable`.

raw docstring

array-union^clj

(array-union left right)

Params: (col1: Column, col2: Column)

Result: Column

Returns an array of the elements in the union of the given two arrays, without duplicates.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.204Z

Params: (col1: Column, col2: Column)

Result: Column

Returns an array of the elements in the union of the given two arrays, without duplicates.


2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.204Z

raw docstring

arrays-overlap^clj

(arrays-overlap left right)

Params: (a1: Column, a2: Column)

Result: Column

Returns true if a1 and a2 have at least one non-null element in common. If not and both the arrays are non-empty and any of them contains a null, it returns null. It returns false otherwise.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.209Z

Params: (a1: Column, a2: Column)

Result: Column

Returns true if a1 and a2 have at least one non-null element in common. If not and both
the arrays are non-empty and any of them contains a null, it returns null. It returns
false otherwise.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.209Z

raw docstring

arrays-zip^clj

(arrays-zip & exprs)

Params: (e: Column*)

Result: Column

Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.211Z

Params: (e: Column*)

Result: Column

Returns a merged array of structs in which the N-th struct contains all N-th values of input
arrays.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.211Z

raw docstring

as^cljmultimethod

Column: Gives the column an alias.

Dataset: Returns a new Dataset with an alias set.

Column: Gives the column an alias.

Dataset: Returns a new Dataset with an alias set.

raw docstring

asc^clj

(asc expr)

Params:

Result: Column

Returns a sort expression based on ascending order of the column.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.867Z

Params: 

Result: Column

Returns a sort expression based on ascending order of the column.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.867Z

raw docstring

asc-nulls-first^clj

(asc-nulls-first expr)

Params:

Result: Column

Returns a sort expression based on ascending order of the column, and null values return before non-null values.

2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.869Z

Params: 

Result: Column

Returns a sort expression based on ascending order of the column,
and null values return before non-null values.

2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.869Z

raw docstring

asc-nulls-last^clj

(asc-nulls-last expr)

Params:

Result: Column

Returns a sort expression based on ascending order of the column, and null values appear after non-null values.

2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.870Z

Params: 

Result: Column

Returns a sort expression based on ascending order of the column,
and null values appear after non-null values.

2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.870Z

raw docstring

ascii^clj

(ascii expr)

Params: (e: Column)

Result: Column

Computes the numeric value of the first character of the string column, and returns the result as an int column.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.216Z

Params: (e: Column)

Result: Column

Computes the numeric value of the first character of the string column, and returns the
result as an int column.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.216Z

raw docstring

asin^clj

(asin expr)

Params: (e: Column)

Result: Column

inverse sine of e in radians, as if computed by java.lang.Math.asin

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.219Z

Params: (e: Column)

Result: Column

inverse sine of e in radians, as if computed by java.lang.Math.asin

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.219Z

raw docstring

assoc^cljmultimethod

Column: variadic version of map-concat.

Dataset: variadic version of with-column.

Column: variadic version of `map-concat`.

Dataset: variadic version of `with-column`.

raw docstring

atan^clj

(atan expr)

Params: (e: Column)

Result: Column

inverse tangent of e, as if computed by java.lang.Math.atan

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.221Z

Params: (e: Column)

Result: Column

inverse tangent of e, as if computed by java.lang.Math.atan

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.221Z

raw docstring

atan-2^clj

(atan-2 expr-x expr-y)

Params: (y: Column, x: Column)

Result: Column

coordinate on y-axis

coordinate on x-axis

the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by java.lang.Math.atan2

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.233Z

Params: (y: Column, x: Column)

Result: Column

coordinate on y-axis

coordinate on x-axis

the theta component of the point
        (r, theta)
        in polar coordinates that corresponds to the point
        (x, y) in Cartesian coordinates,
        as if computed by java.lang.Math.atan2

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.233Z

raw docstring

atan2^clj

(atan2 expr-x expr-y)

Params: (y: Column, x: Column)

Result: Column

coordinate on y-axis

coordinate on x-axis

the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by java.lang.Math.atan2

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.233Z

Params: (y: Column, x: Column)

Result: Column

coordinate on y-axis

coordinate on x-axis

the theta component of the point
        (r, theta)
        in polar coordinates that corresponds to the point
        (x, y) in Cartesian coordinates,
        as if computed by java.lang.Math.atan2

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.233Z

raw docstring

base-64^clj

(base-64 expr)

Params: (e: Column)

Result: Column

Computes the BASE64 encoding of a binary column and returns it as a string column. This is the reverse of unbase64.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.236Z

Params: (e: Column)

Result: Column

Computes the BASE64 encoding of a binary column and returns it as a string column.
This is the reverse of unbase64.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.236Z

raw docstring

base64^clj

(base64 expr)

Params: (e: Column)

Result: Column

Computes the BASE64 encoding of a binary column and returns it as a string column. This is the reverse of unbase64.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.236Z

Params: (e: Column)

Result: Column

Computes the BASE64 encoding of a binary column and returns it as a string column.
This is the reverse of unbase64.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.236Z

raw docstring

between^clj

(between expr lower-bound upper-bound)

Params: (lowerBound: Any, upperBound: Any)

Result: Column

True if the current column is between the lower bound and upper bound, inclusive.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.872Z

Params: (lowerBound: Any, upperBound: Any)

Result: Column

True if the current column is between the lower bound and upper bound, inclusive.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.872Z

raw docstring

bin^clj

(bin expr)

Params: (e: Column)

Result: Column

An expression that returns the string representation of the binary value of the given long column. For example, bin("12") returns "1100".

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.238Z

Params: (e: Column)

Result: Column

An expression that returns the string representation of the binary value of the given long
column. For example, bin("12") returns "1100".


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.238Z

raw docstring

binary-files^cljmultimethod

Params: (path: String, minPartitions: Int)

Result: JavaPairRDD[String, PortableDataStream]

Read a directory of binary files from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI as a byte array. Each file is read as a single record and returned in a key-value pair, where the key is the path of each file, the value is the content of each file.

For example, if you have the following files:

then rdd contains

A suggestion value of the minimal splitting number for input data.

Small files are preferred; very large files but may cause bad performance.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.492Z

Params: (path: String, minPartitions: Int)

Result: JavaPairRDD[String, PortableDataStream]

Read a directory of binary files from HDFS, a local file system (available on all nodes),
or any Hadoop-supported file system URI as a byte array. Each file is read as a single
record and returned in a key-value pair, where the key is the path of each file,
the value is the content of each file.

For example, if you have the following files:

Do

then rdd contains

A suggestion value of the minimal splitting number for input data.

Small files are preferred; very large files but may cause bad performance.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.492Z

raw docstring

bit-size^clj

(bit-size bloom)

Params: ()

Result: Long

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/BloomFilter.html

Timestamp: 2020-10-19T01:56:25.738Z

Params: ()

Result: Long



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/BloomFilter.html

Timestamp: 2020-10-19T01:56:25.738Z

raw docstring

bitwise-and^clj

(bitwise-and left-expr right-expr)

Params: (other: Any)

Result: Column

Compute bitwise AND of this expression with another expression.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.878Z

Params: (other: Any)

Result: Column

Compute bitwise AND of this expression with another expression.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.878Z

raw docstring

bitwise-not^clj

(bitwise-not expr)

Params: (e: Column)

Result: Column

Computes bitwise NOT (~) of a number.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.239Z

Params: (e: Column)

Result: Column

Computes bitwise NOT (~) of a number.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.239Z

raw docstring

bitwise-or^clj

(bitwise-or left-expr right-expr)

Params: (other: Any)

Result: Column

Compute bitwise OR of this expression with another expression.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.879Z

Params: (other: Any)

Result: Column

Compute bitwise OR of this expression with another expression.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.879Z

raw docstring

bitwise-xor^clj

(bitwise-xor left-expr right-expr)

Params: (other: Any)

Result: Column

Compute bitwise XOR of this expression with another expression.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.881Z

Params: (other: Any)

Result: Column

Compute bitwise XOR of this expression with another expression.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.881Z

raw docstring

bloom-filter^clj

(bloom-filter dataframe expr expected-num-items num-bits-or-fpp)

Params: (colName: String, expectedNumItems: Long, fpp: Double)

Result: BloomFilter

Builds a Bloom filter over a specified column.

name of the column over which the filter is built

expected number of items which will be put into the filter.

expected false positive probability of the filter.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameStatFunctions.html

Timestamp: 2020-10-19T01:56:24.647Z

Params: (colName: String, expectedNumItems: Long, fpp: Double)

Result: BloomFilter

Builds a Bloom filter over a specified column.


name of the column over which the filter is built

expected number of items which will be put into the filter.

expected false positive probability of the filter.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameStatFunctions.html

Timestamp: 2020-10-19T01:56:24.647Z

raw docstring

boolean^clj

(boolean expr)

Casts the column to a boolean.

Casts the column to a boolean.

raw docstring

broadcast^clj

(broadcast dataframe)

Params: (df: Dataset[T])

Result: Dataset[T]

Marks a DataFrame as small enough for use in broadcast joins.

The following example marks the right DataFrame for broadcast hash join using joinKey.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.240Z

Params: (df: Dataset[T])

Result: Dataset[T]

Marks a DataFrame as small enough for use in broadcast joins.

The following example marks the right DataFrame for broadcast hash join using joinKey.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.240Z

raw docstring

bround^clj

(bround expr)

Params: (e: Column)

Result: Column

Returns the value of the column e rounded to 0 decimal places with HALF_EVEN round mode.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.243Z

Params: (e: Column)

Result: Column

Returns the value of the column e rounded to 0 decimal places with HALF_EVEN round mode.


2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.243Z

raw docstring

byte^clj

(byte expr)

Casts the column to a byte.

Casts the column to a byte.

raw docstring

cache^clj

(cache dataframe)

Params: ()

Result: Dataset.this.type

Persist this Dataset with the default storage level (MEMORY_AND_DISK).

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.750Z

Params: ()

Result: Dataset.this.type

Persist this Dataset with the default storage level (MEMORY_AND_DISK).


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.750Z

raw docstring

case^clj

(case expr & clauses)

Returns a new Column imitating Clojure's case macro behaviour.

Returns a new Column imitating Clojure's `case` macro behaviour.

raw docstring

cast^clj

(cast expr new-type)

Params: (to: DataType)

Result: Column

Casts the column to a different data type.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.885Z

Params: (to: DataType)

Result: Column

Casts the column to a different data type.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.885Z

raw docstring

cbrt^clj

(cbrt expr)

Params: (e: Column)

Result: Column

Computes the cube-root of the given value.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.253Z

Params: (e: Column)

Result: Column

Computes the cube-root of the given value.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.253Z

raw docstring

ceil^clj

(ceil expr)

Params: (e: Column)

Result: Column

Computes the ceiling of the given value.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.255Z

Params: (e: Column)

Result: Column

Computes the ceiling of the given value.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.255Z

raw docstring

checkpoint^clj

(checkpoint dataframe)

(checkpoint dataframe eager)

Params: ()

Result: Dataset[T]

Eagerly checkpoint a Dataset and return the new Dataset. Checkpointing can be used to truncate the logical plan of this Dataset, which is especially useful in iterative algorithms where the plan may grow exponentially. It will be saved to files inside the checkpoint directory set with SparkContext#setCheckpointDir.

2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.752Z

Params: ()

Result: Dataset[T]

Eagerly checkpoint a Dataset and return the new Dataset. Checkpointing can be used to truncate
the logical plan of this Dataset, which is especially useful in iterative algorithms where the
plan may grow exponentially. It will be saved to files inside the checkpoint
directory set with SparkContext#setCheckpointDir.


2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.752Z

raw docstring

checkpoint-dir^clj

(checkpoint-dir)

(checkpoint-dir spark)

Params:

Result: Optional[String]

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.509Z

Params: 

Result: Optional[String]



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.509Z

raw docstring

clip^clj

(clip expr low high)

Returns a new Column where values outside [low, high] are clipped to the interval edges.

Returns a new Column where values outside `[low, high]` are clipped to the interval edges.

raw docstring

coalesce^cljmultimethod

Column: Returns the first column that is not null, or null if all inputs are null.

Dataset: Returns a new Dataset that has exactly numPartitions partitions, when the fewer partitions are requested.

Column: Returns the first column that is not null, or null if all inputs are null.

Dataset: Returns a new Dataset that has exactly numPartitions partitions, when the fewer partitions are requested.

raw docstring

col^cljmultimethod

Params: (colName: String)

Result: Column

Returns a Column based on the given column name.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.258Z

Params: (colName: String)

Result: Column

Returns a Column based on the given column name.


1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.258Z

raw docstring

col-regex^clj

(col-regex dataframe col-name)

Params: (colName: String)

Result: Column

Selects column based on the column name specified as a regex and returns it as Column.

2.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.758Z

Params: (colName: String)

Result: Column

Selects column based on the column name specified as a regex and returns it as Column.

2.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.758Z

raw docstring

collect^clj

(collect dataframe)

Params: ()

Result: Array[T]

Returns an array that contains all rows in this Dataset.

Running collect requires moving all the data into the application's driver process, and doing so on a very large dataset can crash the driver process with OutOfMemoryError.

For Java API, use collectAsList.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.759Z

Params: ()

Result: Array[T]

Returns an array that contains all rows in this Dataset.

Running collect requires moving all the data into the application's driver process, and
doing so on a very large dataset can crash the driver process with OutOfMemoryError.

For Java API, use collectAsList.


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.759Z

raw docstring

collect-col^clj

(collect-col dataframe col-name)

Returns a vector that contains all rows in the column of the Dataset.

Returns a vector that contains all rows in the column of the Dataset.

raw docstring

collect-list^clj

(collect-list expr)

Params: (e: Column)

Result: Column

Aggregate function: returns a list of objects with duplicates.

1.6.0

The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.261Z

Params: (e: Column)

Result: Column

Aggregate function: returns a list of objects with duplicates.


1.6.0

The function is non-deterministic because the order of collected results depends
on the order of the rows which may be non-deterministic after a shuffle.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.261Z

raw docstring

collect-set^clj

(collect-set expr)

Params: (e: Column)

Result: Column

Aggregate function: returns a set of objects with duplicate elements eliminated.

1.6.0

The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.263Z

Params: (e: Column)

Result: Column

Aggregate function: returns a set of objects with duplicate elements eliminated.


1.6.0

The function is non-deterministic because the order of collected results depends
on the order of the rows which may be non-deterministic after a shuffle.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.263Z

raw docstring

collect-to-arrow^clj

(collect-to-arrow rdd chunk-size out-dir)

Collects the dataframe on driver and exports it as arrow files. The data gets transfered by partition, and so each partions should be small enough to fit in heap space of the driver. Then the data is saved in chunks of chunk-size rows to disk as arrow files.

rdd Spark dataset chunk-size Number of rows each arrow file will have. Should be small enoungh to make data fit in heap space of driver. out-dir Output dir of arrow files

Collects the dataframe on driver and exports it as arrow files.
The data gets transfered by partition, and so each partions should be small
 enough to fit in heap space of the driver. Then the data is saved in chunks
 of `chunk-size` rows to disk as arrow files.

 `rdd` Spark dataset
 `chunk-size` Number of rows each arrow file will have. Should be small
  enoungh to make data fit in heap space of driver.
 `out-dir` Output dir of arrow files

raw docstring

collect-vals^clj

(collect-vals dataframe)

Returns the vector values of the Dataset collected.

Returns the vector values of the Dataset collected.

raw docstring

column-names^clj

(column-names dataframe)

Returns all column names as an array of strings.

Returns all column names as an array of strings.

raw docstring

columns^clj

(columns dataframe)

Returns all column names as an array of keywords.

Returns all column names as an array of keywords.

raw docstring

compatible?^clj

(compatible? bloom other)

Params: (other: BloomFilter)

Result: Boolean

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/BloomFilter.html

Timestamp: 2020-10-19T01:56:25.740Z

Params: (other: BloomFilter)

Result: Boolean



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/BloomFilter.html

Timestamp: 2020-10-19T01:56:25.740Z

raw docstring

concat^clj

(concat & exprs)

Params: (exprs: Column*)

Result: Column

Concatenates multiple input columns together into a single column. The function works with strings, binary and compatible array columns.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.265Z

Params: (exprs: Column*)

Result: Column

Concatenates multiple input columns together into a single column.
The function works with strings, binary and compatible array columns.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.265Z

raw docstring

concat-ws^clj

(concat-ws sep & exprs)

Params: (sep: String, exprs: Column*)

Result: Column

Concatenates multiple input string columns together into a single string column, using the given separator.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.267Z

Params: (sep: String, exprs: Column*)

Result: Column

Concatenates multiple input string columns together into a single string column,
using the given separator.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.267Z

raw docstring

cond^clj

(cond & clauses)

Returns a new Column imitating Clojure's cond macro behaviour.

Returns a new Column imitating Clojure's `cond` macro behaviour.

raw docstring

condp^clj

(condp pred expr & clauses)

Returns a new Column imitating Clojure's condp macro behaviour.

Returns a new Column imitating Clojure's `condp` macro behaviour.

raw docstring

conf^clj

(conf)

(conf spark)

Params:

Result: SparkConf

Return a copy of this JavaSparkContext's configuration. The configuration cannot be changed at runtime.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.511Z

Params: 

Result: SparkConf

Return a copy of this JavaSparkContext's configuration. The configuration cannot be
changed at runtime.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.511Z

raw docstring

confidence^clj

(confidence cms)

Params: ()

Result: Double

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/CountMinSketch.html

Timestamp: 2020-10-19T01:56:26.102Z

Params: ()

Result: Double



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/CountMinSketch.html

Timestamp: 2020-10-19T01:56:26.102Z

raw docstring

contains^clj

(contains expr literal)

Params: (other: Any)

Result: Column

Contains the other element. Returns a boolean column based on a string match.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.888Z

Params: (other: Any)

Result: Column

Contains the other element. Returns a boolean column based on a string match.


1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.888Z

raw docstring

conv^clj

(conv expr from-base to-base)

Params: (num: Column, fromBase: Int, toBase: Int)

Result: Column

Convert a number in a string column from one base to another.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.268Z

Params: (num: Column, fromBase: Int, toBase: Int)

Result: Column

Convert a number in a string column from one base to another.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.268Z

raw docstring

corr^cljmultimethod

Column: Aggregate function: returns the Pearson Correlation Coefficient for two columns.

Datasate: Calculates the Pearson Correlation Coefficient of two columns of a DataFrame.

Column: Aggregate function: returns the Pearson Correlation Coefficient for two columns.

Datasate: Calculates the Pearson Correlation Coefficient of two columns of a DataFrame.

raw docstring

cos^clj

(cos expr)

Params: (e: Column)

Result: Column

angle in radians

cosine of the angle, as if computed by java.lang.Math.cos

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.272Z

Params: (e: Column)

Result: Column

angle in radians

cosine of the angle, as if computed by java.lang.Math.cos

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.272Z

raw docstring

cosh^clj

(cosh expr)

Params: (e: Column)

Result: Column

hyperbolic angle

hyperbolic cosine of the angle, as if computed by java.lang.Math.cosh

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.275Z

Params: (e: Column)

Result: Column

hyperbolic angle

hyperbolic cosine of the angle, as if computed by java.lang.Math.cosh

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.275Z

raw docstring

count^cljmultimethod

Column: Aggregate function: returns the number of items in a group.

Dataset: Returns the number of rows in the Dataset.

RelationalGroupedDataset: Count the number of rows for each group.

Column: Aggregate function: returns the number of items in a group.

Dataset: Returns the number of rows in the Dataset.

RelationalGroupedDataset: Count the number of rows for each group.

raw docstring

count-distinct^clj

(count-distinct & exprs)

Params: (expr: Column, exprs: Column*)

Result: Column

Aggregate function: returns the number of distinct items in a group.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.279Z

Params: (expr: Column, exprs: Column*)

Result: Column

Aggregate function: returns the number of distinct items in a group.


1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.279Z

raw docstring

count-min-sketch^clj

(count-min-sketch dataframe expr eps-or-depth confidence-or-width seed)

Params: (colName: String, depth: Int, width: Int, seed: Int)

Result: CountMinSketch

Builds a Count-min Sketch over a specified column.

name of the column over which the sketch is built

depth of the sketch

width of the sketch

random seed

a CountMinSketch over column colName

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameStatFunctions.html

Timestamp: 2020-10-19T01:56:24.659Z

Params: (colName: String, depth: Int, width: Int, seed: Int)

Result: CountMinSketch

Builds a Count-min Sketch over a specified column.


name of the column over which the sketch is built

depth of the sketch

width of the sketch

random seed

a CountMinSketch over column colName

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameStatFunctions.html

Timestamp: 2020-10-19T01:56:24.659Z

raw docstring

cov^clj

(cov dataframe col-name1 col-name2)

Params: (col1: String, col2: String)

Result: Double

Calculate the sample covariance of two numerical columns of a DataFrame.

the name of the first column

the name of the second column

the covariance of the two columns.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameStatFunctions.html

Timestamp: 2020-10-19T01:56:24.661Z

Params: (col1: String, col2: String)

Result: Double

Calculate the sample covariance of two numerical columns of a DataFrame.

the name of the first column

the name of the second column

the covariance of the two columns.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameStatFunctions.html

Timestamp: 2020-10-19T01:56:24.661Z

raw docstring

covar^clj

(covar l-expr r-expr)

Params: (column1: Column, column2: Column)

Result: Column

Aggregate function: returns the sample covariance for two columns.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.284Z

Params: (column1: Column, column2: Column)

Result: Column

Aggregate function: returns the sample covariance for two columns.


2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.284Z

raw docstring

covar-pop^clj

(covar-pop l-expr r-expr)

Params: (column1: Column, column2: Column)

Result: Column

Aggregate function: returns the population covariance for two columns.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.282Z

Params: (column1: Column, column2: Column)

Result: Column

Aggregate function: returns the population covariance for two columns.


2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.282Z

raw docstring

covar-samp^clj

(covar-samp l-expr r-expr)

Params: (column1: Column, column2: Column)

Result: Column

Aggregate function: returns the sample covariance for two columns.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.284Z

Params: (column1: Column, column2: Column)

Result: Column

Aggregate function: returns the sample covariance for two columns.


2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.284Z

raw docstring

crc-32^clj

(crc-32 expr)

Params: (e: Column)

Result: Column

Calculates the cyclic redundancy check value (CRC32) of a binary column and returns the value as a bigint.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.285Z

Params: (e: Column)

Result: Column

Calculates the cyclic redundancy check value  (CRC32) of a binary column and
returns the value as a bigint.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.285Z

raw docstring

crc32^clj

(crc32 expr)

Params: (e: Column)

Result: Column

Calculates the cyclic redundancy check value (CRC32) of a binary column and returns the value as a bigint.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.285Z

Params: (e: Column)

Result: Column

Calculates the cyclic redundancy check value  (CRC32) of a binary column and
returns the value as a bigint.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.285Z

raw docstring

create-dataframe^clj

(create-dataframe rows schema)

(create-dataframe spark rows schema)

Params: (rdd: RDD[A])

(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[A])

Result: DataFrame

Creates a DataFrame from an RDD of Product (e.g. case classes, tuples).

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/SparkSession.html

Timestamp: 2020-10-19T01:56:50.125Z

Params: (rdd: RDD[A])

(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[A])

Result: DataFrame

Creates a DataFrame from an RDD of Product (e.g. case classes, tuples).


2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/SparkSession.html

Timestamp: 2020-10-19T01:56:50.125Z

raw docstring

create-global-temp-view!^clj

(create-global-temp-view! dataframe view-name)

Creates a global temporary view using the given name.

Global temporary view is cross-session. Its lifetime is the lifetime of the Spark application, i.e. it will be automatically dropped when the application terminates. It's tied to a system preserved database global_temp, and we must use the qualified name to refer a global temp view, e.g. SELECT * FROM global_temp.view1.

Creates a global temporary view using the given name.

Global temporary view is cross-session. Its lifetime is the lifetime of the Spark application,
i.e. it will be automatically dropped when the application terminates. It's tied to a system
preserved database `global_temp`, and we must use the qualified name to refer a global temp
view, e.g. `SELECT * FROM global_temp.view1`.

raw docstring

create-or-replace-global-temp-view!^clj

(create-or-replace-global-temp-view! dataframe view-name)

Creates or replaces a global temporary view using the given name.

Creates or replaces a global temporary view using the given name.

Global temporary view is cross-session. Its lifetime is the lifetime of the Spark application,
i.e. it will be automatically dropped when the application terminates. It's tied to a system
preserved database `global_temp`, and we must use the qualified name to refer a global temp
view, e.g. `SELECT * FROM global_temp.view1`.

raw docstring

create-or-replace-temp-view!^clj

(create-or-replace-temp-view! dataframe view-name)

Creates or replaces a local temporary view using the given name.

The lifetime of this temporary view is tied to the SparkSession that was used to create this Dataset.

Creates or replaces a local temporary view using the given name.

The lifetime of this temporary view is tied to the `SparkSession` that was used to create this Dataset.

raw docstring

create-spark-session^clj

(create-spark-session
  {:keys [app-name master configs log-level checkpoint-dir]
   :or {app-name "Geni App" master "local[*]" configs {} log-level "WARN"}})

The entry point to programming Spark with the Dataset and DataFrame API.

The entry point to programming Spark with the Dataset and DataFrame API.

raw docstring

create-temp-view!^clj

(create-temp-view! dataframe view-name)

Creates a local temporary view using the given name.

Local temporary view is session-scoped. Its lifetime is the lifetime of the session that created it, i.e. it will be automatically dropped when the session terminates. It's not tied to any databases, i.e. we can't use db1.view1 to reference a local temporary view.

Creates a local temporary view using the given name.

Local temporary view is session-scoped. Its lifetime is the lifetime of the session that
created it, i.e. it will be automatically dropped when the session terminates. It's not tied
to any databases, i.e. we can't use `db1.view1` to reference a local temporary view.

raw docstring

cross-join^clj

(cross-join left right)

Params: (right: Dataset[_])

Result: DataFrame

Explicit cartesian join with another DataFrame.

Right side of the join operation.

2.1.0

Cartesian joins are very expensive without an extra filter that can be pushed down.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.770Z

Params: (right: Dataset[_])

Result: DataFrame

Explicit cartesian join with another DataFrame.


Right side of the join operation.

2.1.0

Cartesian joins are very expensive without an extra filter that can be pushed down.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.770Z

raw docstring

crosstab^clj

(crosstab dataframe col-name1 col-name2)

Params: (col1: String, col2: String)

Result: DataFrame

Computes a pair-wise frequency table of the given columns. Also known as a contingency table. The number of distinct values for each column should be less than 1e4. At most 1e6 non-zero pair frequencies will be returned. The first column of each row will be the distinct values of col1 and the column names will be the distinct values of col2. The name of the first column will be col1_col2. Counts will be returned as Longs. Pairs that have no occurrences will have zero as their counts. Null elements will be replaced by "null", and back ticks will be dropped from elements if they exist.

The name of the first column. Distinct items will make the first item of each row.

The name of the second column. Distinct items will make the column names of the DataFrame.

A DataFrame containing for the contingency table.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameStatFunctions.html

Timestamp: 2020-10-19T01:56:24.664Z

Params: (col1: String, col2: String)

Result: DataFrame

Computes a pair-wise frequency table of the given columns. Also known as a contingency table.
The number of distinct values for each column should be less than 1e4. At most 1e6 non-zero
pair frequencies will be returned.
The first column of each row will be the distinct values of col1 and the column names will
be the distinct values of col2. The name of the first column will be col1_col2. Counts
will be returned as Longs. Pairs that have no occurrences will have zero as their counts.
Null elements will be replaced by "null", and back ticks will be dropped from elements if they
exist.


The name of the first column. Distinct items will make the first item of
            each row.

The name of the second column. Distinct items will make the column names
            of the DataFrame.

A DataFrame containing for the contingency table.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameStatFunctions.html

Timestamp: 2020-10-19T01:56:24.664Z

raw docstring

cube^clj

(cube dataframe & exprs)

Params: (cols: Column*)

Result: RelationalGroupedDataset

Create a multi-dimensional cube for the current Dataset using the specified columns, so we can run aggregation on them. See RelationalGroupedDataset for all the available aggregate functions.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.778Z

Params: (cols: Column*)

Result: RelationalGroupedDataset

Create a multi-dimensional cube for the current Dataset using the specified columns,
so we can run aggregation on them.
See RelationalGroupedDataset for all the available aggregate functions.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.778Z

raw docstring

cube-root^clj

(cube-root expr)

Params: (e: Column)

Result: Column

Computes the cube-root of the given value.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.253Z

Params: (e: Column)

Result: Column

Computes the cube-root of the given value.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.253Z

raw docstring

cume-dist^clj

(cume-dist)

Params: ()

Result: Column

Window function: returns the cumulative distribution of values within a window partition, i.e. the fraction of rows that are below the current row.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.286Z

Params: ()

Result: Column

Window function: returns the cumulative distribution of values within a window partition,
i.e. the fraction of rows that are below the current row.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.286Z

raw docstring

current-date^clj

(current-date)

Params: ()

Result: Column

Returns the current date as a date column.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.287Z

Params: ()

Result: Column

Returns the current date as a date column.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.287Z

raw docstring

current-timestamp^clj

(current-timestamp)

Params: ()

Result: Column

Returns the current timestamp as a timestamp column.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.288Z

Params: ()

Result: Column

Returns the current timestamp as a timestamp column.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.288Z

raw docstring

cut^clj

(cut expr bins)

Returns a new Column of discretised expr into the intervals of bins.

Returns a new Column of discretised `expr` into the intervals of bins.

raw docstring

date-add^clj

(date-add expr days)

Params: (start: Column, days: Int)

Result: Column

Returns the date that is days days after start

A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

The number of days to add to start, can be negative to subtract days

A date, or null if start was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.295Z

Params: (start: Column, days: Int)

Result: Column

Returns the date that is days days after start


A date, timestamp or string. If a string, the data must be in a format that
             can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

The number of days to add to start, can be negative to subtract days

A date, or null if start was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.295Z

raw docstring

date-diff^clj

(date-diff l-expr r-expr)

Params: (end: Column, start: Column)

Result: Column

Returns the number of days from start to end.

Only considers the date part of the input. For example:

A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

An integer, or null if either end or start were strings that could not be cast to a date. Negative if end is before start

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.304Z

Params: (end: Column, start: Column)

Result: Column

Returns the number of days from start to end.

Only considers the date part of the input. For example:

A date, timestamp or string. If a string, the data must be in a format that
           can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A date, timestamp or string. If a string, the data must be in a format that
             can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

An integer, or null if either end or start were strings that could not be cast to
        a date. Negative if end is before start

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.304Z

raw docstring

date-format^clj

(date-format expr date-fmt)

Params: (dateExpr: Column, format: String)

Result: Column

Converts a date/timestamp/string to a value of string in the format specified by the date format given by the second argument.

See Datetime Patterns for valid date and time format patterns

A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A pattern dd.MM.yyyy would return a string like 18.03.1993

A string, or null if dateExpr was a string that could not be cast to a timestamp

1.5.0

IllegalArgumentException if the format pattern is invalid

Use specialized functions like year whenever possible as they benefit from a specialized implementation.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.297Z

Params: (dateExpr: Column, format: String)

Result: Column

Converts a date/timestamp/string to a value of string in the format specified by the date
format given by the second argument.

See 
  Datetime Patterns
for valid date and time format patterns


A date, timestamp or string. If a string, the data must be in a format that
                can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A pattern dd.MM.yyyy would return a string like 18.03.1993

A string, or null if dateExpr was a string that could not be cast to a timestamp

1.5.0

IllegalArgumentException if the format pattern is invalid

Use specialized functions like year whenever possible as they benefit from a
specialized implementation.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.297Z

raw docstring

date-sub^clj

(date-sub expr days)

Params: (start: Column, days: Int)

Result: Column

Returns the date that is days days before start

A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

The number of days to subtract from start, can be negative to add days

A date, or null if start was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.300Z

Params: (start: Column, days: Int)

Result: Column

Returns the date that is days days before start


A date, timestamp or string. If a string, the data must be in a format that
             can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

The number of days to subtract from start, can be negative to add days

A date, or null if start was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.300Z

raw docstring

date-trunc^clj

(date-trunc fmt expr)

Params: (format: String, timestamp: Column)

Result: Column

Returns timestamp truncated to the unit specified by the format.

For example, date_trunc("year", "2018-11-19 12:01:19") returns 2018-01-01 00:00:00

A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A timestamp, or null if timestamp was a string that could not be cast to a timestamp or format was an invalid value

2.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.302Z

Params: (format: String, timestamp: Column)

Result: Column

Returns timestamp truncated to the unit specified by the format.

For example, date_trunc("year", "2018-11-19 12:01:19") returns 2018-01-01 00:00:00


A date, timestamp or string. If a string, the data must be in a format that
                 can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A timestamp, or null if timestamp was a string that could not be cast to a timestamp
        or format was an invalid value

2.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.302Z

raw docstring

datediff^clj

(datediff l-expr r-expr)

Params: (end: Column, start: Column)

Result: Column

Returns the number of days from start to end.

Only considers the date part of the input. For example:

A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

An integer, or null if either end or start were strings that could not be cast to a date. Negative if end is before start

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.304Z

Params: (end: Column, start: Column)

Result: Column

Returns the number of days from start to end.

Only considers the date part of the input. For example:

A date, timestamp or string. If a string, the data must be in a format that
           can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A date, timestamp or string. If a string, the data must be in a format that
             can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

An integer, or null if either end or start were strings that could not be cast to
        a date. Negative if end is before start

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.304Z

raw docstring

day-of-month^clj

(day-of-month expr)

Params: (e: Column)

Result: Column

Extracts the day of the month as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.305Z

Params: (e: Column)

Result: Column

Extracts the day of the month as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.305Z

raw docstring

day-of-week^clj

(day-of-week expr)

Params: (e: Column)

Result: Column

Extracts the day of the week as an integer from a given date/timestamp/string. Ranges from 1 for a Sunday through to 7 for a Saturday

An integer, or null if the input was a string that could not be cast to a date

2.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.306Z

Params: (e: Column)

Result: Column

Extracts the day of the week as an integer from a given date/timestamp/string.
Ranges from 1 for a Sunday through to 7 for a Saturday

An integer, or null if the input was a string that could not be cast to a date

2.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.306Z

raw docstring

day-of-year^clj

(day-of-year expr)

Params: (e: Column)

Result: Column

Extracts the day of the year as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.307Z

Params: (e: Column)

Result: Column

Extracts the day of the year as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.307Z

raw docstring

dayofmonth^clj

(dayofmonth expr)

Params: (e: Column)

Result: Column

Extracts the day of the month as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.305Z

Params: (e: Column)

Result: Column

Extracts the day of the month as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.305Z

raw docstring

dayofweek^clj

(dayofweek expr)

Params: (e: Column)

Result: Column

Extracts the day of the week as an integer from a given date/timestamp/string. Ranges from 1 for a Sunday through to 7 for a Saturday

An integer, or null if the input was a string that could not be cast to a date

2.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.306Z

Params: (e: Column)

Result: Column

Extracts the day of the week as an integer from a given date/timestamp/string.
Ranges from 1 for a Sunday through to 7 for a Saturday

An integer, or null if the input was a string that could not be cast to a date

2.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.306Z

raw docstring

dayofyear^clj

(dayofyear expr)

Params: (e: Column)

Result: Column

Extracts the day of the year as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.307Z

Params: (e: Column)

Result: Column

Extracts the day of the year as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.307Z

raw docstring

dec^clj

(dec expr)

Returns an expression one less than expr.

Returns an expression one less than `expr`.

raw docstring

decode^clj

(decode expr charset)

Params: (value: Column, charset: String)

Result: Column

Computes the first argument into a string from a binary using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.309Z

Params: (value: Column, charset: String)

Result: Column

Computes the first argument into a string from a binary using the provided character set
(one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16').
If either argument is null, the result will also be null.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.309Z

raw docstring

default-min-partitions^clj

(default-min-partitions)

(default-min-partitions spark)

Params:

Result: Integer

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.503Z

Params: 

Result: Integer



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.503Z

raw docstring

default-parallelism^clj

(default-parallelism)

(default-parallelism spark)

Params:

Result: Integer

Default level of parallelism to use when not given by user (e.g. parallelize and makeRDD).

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.504Z

Params: 

Result: Integer

Default level of parallelism to use when not given by user (e.g. parallelize and makeRDD).

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.504Z

raw docstring

degrees^clj

(degrees expr)

Params: (e: Column)

Result: Column

Converts an angle measured in radians to an approximately equivalent angle measured in degrees.

angle in radians

angle in degrees, as if computed by java.lang.Math.toDegrees

2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.312Z

Params: (e: Column)

Result: Column

Converts an angle measured in radians to an approximately equivalent angle measured in degrees.


angle in radians

angle in degrees, as if computed by java.lang.Math.toDegrees

2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.312Z

raw docstring

dense^clj

(dense & values)

Params: (firstValue: Double, otherValues: Double*)

Result: Vector

Creates a dense vector from its values.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/ml/linalg/Vectors$.html

Timestamp: 2020-10-19T01:56:35.334Z

Params: (firstValue: Double, otherValues: Double*)

Result: Vector

Creates a dense vector from its values.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/ml/linalg/Vectors$.html

Timestamp: 2020-10-19T01:56:35.334Z

raw docstring

dense-rank^clj

(dense-rank)

Params: ()

Result: Column

Window function: returns the rank of rows within a window partition, without any gaps.

The difference between rank and dense_rank is that denseRank leaves no gaps in ranking sequence when there are ties. That is, if you were ranking a competition using dense_rank and had three people tie for second place, you would say that all three were in second place and that the next person came in third. Rank would give me sequential numbers, making the person that came in third place (after the ties) would register as coming in fifth.

This is equivalent to the DENSE_RANK function in SQL.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.313Z

Params: ()

Result: Column

Window function: returns the rank of rows within a window partition, without any gaps.

The difference between rank and dense_rank is that denseRank leaves no gaps in ranking
sequence when there are ties. That is, if you were ranking a competition using dense_rank
and had three people tie for second place, you would say that all three were in second
place and that the next person came in third. Rank would give me sequential numbers, making
the person that came in third place (after the ties) would register as coming in fifth.

This is equivalent to the DENSE_RANK function in SQL.


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.313Z

raw docstring

depth^clj

(depth cms)

Params: ()

Result: Int

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/CountMinSketch.html

Timestamp: 2020-10-19T01:56:26.103Z

Params: ()

Result: Int



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/CountMinSketch.html

Timestamp: 2020-10-19T01:56:26.103Z

raw docstring

desc^clj

(desc expr)

Params:

Result: Column

Returns a sort expression based on the descending order of the column.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.890Z

Params: 

Result: Column

Returns a sort expression based on the descending order of the column.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.890Z

raw docstring

desc-nulls-first^clj

(desc-nulls-first expr)

Params:

Result: Column

Returns a sort expression based on the descending order of the column, and null values appear before non-null values.

2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.891Z

Params: 

Result: Column

Returns a sort expression based on the descending order of the column,
and null values appear before non-null values.

2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.891Z

raw docstring

desc-nulls-last^clj

(desc-nulls-last expr)

Params:

Result: Column

Returns a sort expression based on the descending order of the column, and null values appear after non-null values.

2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.893Z

Params: 

Result: Column

Returns a sort expression based on the descending order of the column,
and null values appear after non-null values.

2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.893Z

raw docstring

describe^clj

(describe dataframe & col-names)

Params: (cols: String*)

Result: DataFrame

Computes basic statistics for numeric and string columns, including count, mean, stddev, min, and max. If no columns are given, this function computes statistics for all numerical or string columns.

This function is meant for exploratory data analysis, as we make no guarantee about the backward compatibility of the schema of the resulting Dataset. If you want to programmatically compute summary statistics, use the agg function instead.

Use summary for expanded statistics and control over which statistics to compute.

Columns to compute statistics on.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.780Z

Params: (cols: String*)

Result: DataFrame

Computes basic statistics for numeric and string columns, including count, mean, stddev, min,
and max. If no columns are given, this function computes statistics for all numerical or
string columns.

This function is meant for exploratory data analysis, as we make no guarantee about the
backward compatibility of the schema of the resulting Dataset. If you want to
programmatically compute summary statistics, use the agg function instead.

Use summary for expanded statistics and control over which statistics to compute.


Columns to compute statistics on.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.780Z

raw docstring

disk-only^clj

Flag for controlling the storage of an RDD.

DataFrame is stored only on disk and the CPU computation time is high as I/O involved.

Flag for controlling the storage of an RDD.

DataFrame is stored only on disk and the CPU computation time is high as I/O involved.

raw docstring

disk-only-2^clj

Flag for controlling the storage of an RDD.

Same as disk-only storage level but replicate each partition to two cluster nodes.

Flag for controlling the storage of an RDD.

Same as disk-only storage level but replicate each partition to two cluster nodes.

raw docstring

dissoc^cljmultimethod

Column: Returns a map whose key is not in ks.

Dataset: variadic version of drop.

Column: Returns a map whose key is not in `ks`.

Dataset: variadic version of `drop`.

raw docstring

distinct^clj

(distinct dataframe)

Params: ()

Result: Dataset[T]

Returns a new Dataset that contains only the unique rows from this Dataset. This is an alias for dropDuplicates.

2.0.0

Equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom equals function defined on T.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.781Z

Params: ()

Result: Dataset[T]

Returns a new Dataset that contains only the unique rows from this Dataset.
This is an alias for dropDuplicates.


2.0.0

Equality checking is performed directly on the encoded representation of the data
and thus is not affected by a custom equals function defined on T.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.781Z

raw docstring

double^clj

(double expr)

Casts the column to a double.

Casts the column to a double.

raw docstring

drop^clj

(drop dataframe & col-names)

Params: (colName: String)

Result: DataFrame

Returns a new Dataset with a column dropped. This is a no-op if schema doesn't contain column name.

This method can only be used to drop top level columns. the colName string is treated literally without further interpretation.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.785Z

Params: (colName: String)

Result: DataFrame

Returns a new Dataset with a column dropped. This is a no-op if schema doesn't contain
column name.

This method can only be used to drop top level columns. the colName string is treated
literally without further interpretation.


2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.785Z

raw docstring

drop-duplicates^clj

(drop-duplicates dataframe & col-names)

Params: ()

Result: Dataset[T]

Returns a new Dataset that contains only the unique rows from this Dataset. This is an alias for distinct.

For a static batch Dataset, it just drops duplicate rows. For a streaming Dataset, it will keep all data across triggers as intermediate state to drop duplicates rows. You can use withWatermark to limit how late the duplicate data can be and system will accordingly limit the state. In addition, too late data older than watermark will be dropped to avoid any possibility of duplicates.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.791Z

Params: ()

Result: Dataset[T]

Returns a new Dataset that contains only the unique rows from this Dataset.
This is an alias for distinct.

For a static batch Dataset, it just drops duplicate rows. For a streaming Dataset, it
will keep all data across triggers as intermediate state to drop duplicates rows. You can use
withWatermark to limit how late the duplicate data can be and system will accordingly limit
the state. In addition, too late data older than watermark will be dropped to avoid any
possibility of duplicates.


2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.791Z

raw docstring

drop-na^clj

(drop-na dataframe)

(drop-na dataframe min-non-nulls-or-cols)

(drop-na dataframe min-non-nulls cols)

Params: ()

Result: DataFrame

Returns a new DataFrame that drops rows containing any null or NaN values.

1.3.1

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameNaFunctions.html

Timestamp: 2020-10-19T01:56:23.886Z

Params: ()

Result: DataFrame

Returns a new DataFrame that drops rows containing any null or NaN values.


1.3.1

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameNaFunctions.html

Timestamp: 2020-10-19T01:56:23.886Z

raw docstring

dtypes^clj

(dtypes dataframe)

Params:

Result: Array[(String, String)]

Returns all column names and their data types as an array.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.792Z

Params: 

Result: Array[(String, String)]

Returns all column names and their data types as an array.


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.792Z

raw docstring

element-at^clj

(element-at expr value)

Params: (column: Column, value: Any)

Result: Column

Returns element of array at given index in value if column is array. Returns value for the given key in value if column is map.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.318Z

Params: (column: Column, value: Any)

Result: Column

Returns element of array at given index in value if column is array. Returns value for
the given key in value if column is map.


2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.318Z

raw docstring

empty?^clj

(empty? dataframe)

Params:

Result: Boolean

Returns true if the Dataset is empty.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.840Z

Params: 

Result: Boolean

Returns true if the Dataset is empty.


2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.840Z

raw docstring

encode^clj

(encode expr charset)

Params: (value: Column, charset: String)

Result: Column

Computes the first argument into a binary from a string using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.319Z

Params: (value: Column, charset: String)

Result: Column

Computes the first argument into a binary from a string using the provided character set
(one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16').
If either argument is null, the result will also be null.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.319Z

raw docstring

ends-with^clj

(ends-with expr literal)

Params: (other: Column)

Result: Column

String ends with. Returns a boolean column based on a string match.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.898Z

Params: (other: Column)

Result: Column

String ends with. Returns a boolean column based on a string match.


1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.898Z

raw docstring

estimate-count^clj

(estimate-count cms item)

Params: (item: Any)

Result: Long

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/CountMinSketch.html

Timestamp: 2020-10-19T01:56:26.104Z

Params: (item: Any)

Result: Long



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/CountMinSketch.html

Timestamp: 2020-10-19T01:56:26.104Z

raw docstring

even?^clj

(even? expr)

Returns true if expr is even, else false.

Returns true if `expr` is even, else false.

raw docstring

except^clj

(except dataframe other)

Params: (other: Dataset[T])

Result: Dataset[T]

Returns a new Dataset containing rows in this Dataset but not in another Dataset. This is equivalent to EXCEPT DISTINCT in SQL.

2.0.0

Equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom equals function defined on T.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.796Z

Params: (other: Dataset[T])

Result: Dataset[T]

Returns a new Dataset containing rows in this Dataset but not in another Dataset.
This is equivalent to EXCEPT DISTINCT in SQL.


2.0.0

Equality checking is performed directly on the encoded representation of the data
and thus is not affected by a custom equals function defined on T.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.796Z

raw docstring

except-all^clj

(except-all dataframe other)

Params: (other: Dataset[T])

Result: Dataset[T]

Returns a new Dataset containing rows in this Dataset but not in another Dataset while preserving the duplicates. This is equivalent to EXCEPT ALL in SQL.

2.4.0

Equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom equals function defined on T. Also as standard in SQL, this function resolves columns by position (not by name).

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.798Z

Params: (other: Dataset[T])

Result: Dataset[T]

Returns a new Dataset containing rows in this Dataset but not in another Dataset while
preserving the duplicates.
This is equivalent to EXCEPT ALL in SQL.


2.4.0

Equality checking is performed directly on the encoded representation of the data
and thus is not affected by a custom equals function defined on T. Also as standard in
SQL, this function resolves columns by position (not by name).

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.798Z

raw docstring

exists^clj

(exists expr predicate)

Params: (column: Column, f: (Column) ⇒ Column)

Result: Column

Returns whether a predicate holds for one or more elements in the array.

the input array column

col => predicate, the Boolean predicate to check the input column

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.322Z

Params: (column: Column, f: (Column) ⇒ Column)

Result: Column

Returns whether a predicate holds for one or more elements in the array.

the input array column

col => predicate, the Boolean predicate to check the input column

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.322Z

raw docstring

exp^clj

(exp expr)

Params: (e: Column)

Result: Column

Computes the exponential of the given value.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.324Z

Params: (e: Column)

Result: Column

Computes the exponential of the given value.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.324Z

raw docstring

expected-fpp^clj

(expected-fpp bloom)

Params: ()

Result: Double

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/BloomFilter.html

Timestamp: 2020-10-19T01:56:25.739Z

Params: ()

Result: Double



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/BloomFilter.html

Timestamp: 2020-10-19T01:56:25.739Z

raw docstring

explain^cljmultimethod

Column: Prints the expression to the console for debugging purposes.

Dataset: Prints the physical plan to the console for debugging purposes.

Column: Prints the expression to the console for debugging purposes.

Dataset: Prints the physical plan to the console for debugging purposes.

raw docstring

explode^clj

(explode expr)

Params: (e: Column)

Result: Column

Creates a new row for each element in the given array or map column. Uses the default column name col for elements in the array and key and value for elements in the map unless specified otherwise.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.325Z

Params: (e: Column)

Result: Column

Creates a new row for each element in the given array or map column.
Uses the default column name col for elements in the array and
key and value for elements in the map unless specified otherwise.


1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.325Z

raw docstring

expm-1^clj

(expm-1 expr)

Params: (e: Column)

Result: Column

Computes the exponential of the given value minus one.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.329Z

Params: (e: Column)

Result: Column

Computes the exponential of the given value minus one.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.329Z

raw docstring

expm1^clj

(expm1 expr)

Params: (e: Column)

Result: Column

Computes the exponential of the given value minus one.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.329Z

Params: (e: Column)

Result: Column

Computes the exponential of the given value minus one.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.329Z

raw docstring

expr^clj

(expr s)

Params: (expr: String)

Result: Column

Parses the expression string into the column that it represents, similar to Dataset#selectExpr.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.330Z

Params: (expr: String)

Result: Column

Parses the expression string into the column that it represents, similar to
Dataset#selectExpr.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.330Z

raw docstring

factorial^clj

(factorial expr)

Params: (e: Column)

Result: Column

Computes the factorial of the given value.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.331Z

Params: (e: Column)

Result: Column

Computes the factorial of the given value.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.331Z

raw docstring

fill-na^clj

(fill-na dataframe value)

(fill-na dataframe value cols)

Params: (value: Long)

Result: DataFrame

Returns a new DataFrame that replaces null or NaN values in numeric columns with value.

2.2.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameNaFunctions.html

Timestamp: 2020-10-19T01:56:23.908Z

Params: (value: Long)

Result: DataFrame

Returns a new DataFrame that replaces null or NaN values in numeric columns with value.


2.2.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameNaFunctions.html

Timestamp: 2020-10-19T01:56:23.908Z

raw docstring

filter^cljmultimethod

Column: Returns an array of elements for which a predicate holds in a given array.

Dataset: Filters rows using the given condition.

Column: Returns an array of elements for which a predicate holds in a given array.

Dataset: Filters rows using the given condition.

raw docstring

first^cljmultimethod

Column: Aggregate function: returns the first value of a column in a group.

Dataset: Returns the first row.

Column: Aggregate function: returns the first value of a column in a group.

Dataset: Returns the first row.

raw docstring

first-vals^clj

(first-vals dataframe)

Returns the vector values of the first row in the Dataset collected.

Returns the vector values of the first row in the Dataset collected.

raw docstring

flatten^clj

(flatten expr)

Params: (e: Column)

Result: Column

Creates a single array from an array of arrays. If a structure of nested arrays is deeper than two levels, only one level of nesting is removed.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.345Z

Params: (e: Column)

Result: Column

Creates a single array from an array of arrays. If a structure of nested arrays is deeper than
two levels, only one level of nesting is removed.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.345Z

raw docstring

float^clj

(float expr)

Casts the column to a float.

Casts the column to a float.

raw docstring

floor^clj

(floor expr)

Params: (e: Column)

Result: Column

Computes the floor of the given value.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.347Z

Params: (e: Column)

Result: Column

Computes the floor of the given value.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.347Z

raw docstring

forall^clj

(forall expr predicate)

Params: (column: Column, f: (Column) ⇒ Column)

Result: Column

Returns whether a predicate holds for every element in the array.

the input array column

col => predicate, the Boolean predicate to check the input column

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.349Z

Params: (column: Column, f: (Column) ⇒ Column)

Result: Column

Returns whether a predicate holds for every element in the array.

the input array column

col => predicate, the Boolean predicate to check the input column

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.349Z

raw docstring

format-number^clj

(format-number expr decimal-places)

Params: (x: Column, d: Int)

Result: Column

Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places with HALF_EVEN round mode, and returns the result as a string column.

If d is 0, the result has no decimal point or fractional part. If d is less than 0, the result will be null.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.350Z

Params: (x: Column, d: Int)

Result: Column

Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places
with HALF_EVEN round mode, and returns the result as a string column.

If d is 0, the result has no decimal point or fractional part.
If d is less than 0, the result will be null.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.350Z

raw docstring

format-string^clj

(format-string fmt & exprs)

Params: (format: String, arguments: Column*)

Result: Column

Formats the arguments in printf-style and returns the result as a string column.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.351Z

Params: (format: String, arguments: Column*)

Result: Column

Formats the arguments in printf-style and returns the result as a string column.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.351Z

raw docstring

freq-items^clj

(freq-items dataframe col-names)

(freq-items dataframe col-names support)

Params: (cols: Array[String], support: Double)

Result: DataFrame

Finding frequent items for columns, possibly with false positives. Using the frequent element count algorithm described in here, proposed by Karp, Schenker, and Papadimitriou. The support should be greater than 1e-4.

This function is meant for exploratory data analysis, as we make no guarantee about the backward compatibility of the schema of the resulting DataFrame.

the names of the columns to search frequent items in.

The minimum frequency for an item to be considered frequent. Should be greater than 1e-4.

A Local DataFrame with the Array of frequent items for each column.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameStatFunctions.html

Timestamp: 2020-10-19T01:56:24.676Z

Params: (cols: Array[String], support: Double)

Result: DataFrame

Finding frequent items for columns, possibly with false positives. Using the
frequent element count algorithm described in
here, proposed by Karp,
Schenker, and Papadimitriou.
The support should be greater than 1e-4.

This function is meant for exploratory data analysis, as we make no guarantee about the
backward compatibility of the schema of the resulting DataFrame.


the names of the columns to search frequent items in.

The minimum frequency for an item to be considered frequent. Should be greater
               than 1e-4.

A Local DataFrame with the Array of frequent items for each column.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameStatFunctions.html

Timestamp: 2020-10-19T01:56:24.676Z

raw docstring

from-csv^clj

(from-csv expr schema)

(from-csv expr schema options)

Params: (e: Column, schema: StructType, options: Map[String, String])

Result: Column

Parses a column containing a CSV string into a StructType with the specified schema. Returns null, in the case of an unparseable string.

a string column containing CSV data.

the schema to use when parsing the CSV string

options to control how the CSV is parsed. accepts the same options and the CSV data source.

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.354Z

Params: (e: Column, schema: StructType, options: Map[String, String])

Result: Column

Parses a column containing a CSV string into a StructType with the specified schema.
Returns null, in the case of an unparseable string.


a string column containing CSV data.

the schema to use when parsing the CSV string

options to control how the CSV is parsed. accepts the same options and the
               CSV data source.

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.354Z

raw docstring

from-json^clj

(from-json expr schema)

(from-json expr schema options)

Params: (e: Column, schema: StructType, options: Map[String, String])

Result: Column

(Scala-specific) Parses a column containing a JSON string into a StructType with the specified schema. Returns null, in the case of an unparseable string.

a string column containing JSON data.

the schema to use when parsing the json string

options to control how the json is parsed. Accepts the same options as the json data source.

2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.372Z

Params: (e: Column, schema: StructType, options: Map[String, String])

Result: Column

(Scala-specific) Parses a column containing a JSON string into a StructType with the
specified schema. Returns null, in the case of an unparseable string.


a string column containing JSON data.

the schema to use when parsing the json string

options to control how the json is parsed. Accepts the same options as the
               json data source.

2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.372Z

raw docstring

from-unixtime^clj

(from-unixtime expr)

(from-unixtime expr fmt)

Params: (ut: Column)

Result: Column

Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the yyyy-MM-dd HH:mm:ss format.

A number of a type that is castable to a long, such as string or integer. Can be negative for timestamps before the unix epoch

A string, or null if the input was a string that could not be cast to a long

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.375Z

Params: (ut: Column)

Result: Column

Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string
representing the timestamp of that moment in the current system time zone in the
yyyy-MM-dd HH:mm:ss format.


A number of a type that is castable to a long, such as string or integer. Can be
          negative for timestamps before the unix epoch

A string, or null if the input was a string that could not be cast to a long

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.375Z

raw docstring

get-checkpoint-dir^clj

(get-checkpoint-dir)

(get-checkpoint-dir spark)

Params:

Result: Optional[String]

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.509Z

Params: 

Result: Optional[String]



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.509Z

raw docstring

get-conf^clj

(get-conf)

(get-conf spark)

Params:

Result: SparkConf

Return a copy of this JavaSparkContext's configuration. The configuration cannot be changed at runtime.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.511Z

Params: 

Result: SparkConf

Return a copy of this JavaSparkContext's configuration. The configuration cannot be
changed at runtime.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.511Z

raw docstring

get-field^clj

(get-field expr field-name)

Params: (fieldName: String)

Result: Column

An expression that gets a field by name in a StructType.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.913Z

Params: (fieldName: String)

Result: Column

An expression that gets a field by name in a StructType.


1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.913Z

raw docstring

get-item^clj

(get-item expr k)

Params: (key: Any)

Result: Column

An expression that gets an item at position ordinal out of an array, or gets a value by key key in a MapType.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.915Z

Params: (key: Any)

Result: Column

An expression that gets an item at position ordinal out of an array,
or gets a value by key key in a MapType.


1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.915Z

raw docstring

get-local-property^clj

(get-local-property k)

(get-local-property spark k)

Params: (key: String)

Result: String

Get a local property set in this thread, or null if it is missing. See org.apache.spark.api.java.JavaSparkContext.setLocalProperty.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.512Z

Params: (key: String)

Result: String

Get a local property set in this thread, or null if it is missing. See
org.apache.spark.api.java.JavaSparkContext.setLocalProperty.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.512Z

raw docstring

get-persistent-rdds^clj

(get-persistent-rdds)

(get-persistent-rdds spark)

Params:

Result: Map[Integer, JavaRDD[_]]

Returns a Java map of JavaRDDs that have marked themselves as persistent via cache() call.

This does not necessarily mean the caching or computation was successful.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.513Z

Params: 

Result: Map[Integer, JavaRDD[_]]

Returns a Java map of JavaRDDs that have marked themselves as persistent via cache() call.


This does not necessarily mean the caching or computation was successful.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.513Z

raw docstring

get-spark-home^clj

(get-spark-home)

(get-spark-home spark)

Params: ()

Result: Optional[String]

Get Spark's home location from either a value set through the constructor, or the spark.home Java property, or the SPARK_HOME environment variable (in that order of preference). If neither of these is set, return None.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.518Z

Params: ()

Result: Optional[String]

Get Spark's home location from either a value set through the constructor,
or the spark.home Java property, or the SPARK_HOME environment variable
(in that order of preference). If neither of these is set, return None.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.518Z

raw docstring

greatest^clj

(greatest & exprs)

Params: (exprs: Column*)

Result: Column

Returns the greatest value of the list of values, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.382Z

Params: (exprs: Column*)

Result: Column

Returns the greatest value of the list of values, skipping null values.
This function takes at least 2 parameters. It will return null iff all parameters are null.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.382Z

raw docstring

group-by^clj

(group-by dataframe & exprs)

Params: (cols: Column*)

Result: RelationalGroupedDataset

Groups the Dataset using the specified columns, so we can run aggregation on them. See RelationalGroupedDataset for all the available aggregate functions.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.827Z

Params: (cols: Column*)

Result: RelationalGroupedDataset

Groups the Dataset using the specified columns, so we can run aggregation on them. See
RelationalGroupedDataset for all the available aggregate functions.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.827Z

raw docstring

grouping^clj

(grouping expr)

Params: (e: Column)

Result: Column

Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated or not, returns 1 for aggregated or 0 for not aggregated in the result set.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.388Z

Params: (e: Column)

Result: Column

Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated
or not, returns 1 for aggregated or 0 for not aggregated in the result set.


2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.388Z

raw docstring

grouping-id^clj

(grouping-id & exprs)

Params: (cols: Column*)

Result: Column

Aggregate function: returns the level of grouping, equals to

2.0.0

The list of columns should match with grouping columns exactly, or empty (means all the grouping columns).

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.390Z

Params: (cols: Column*)

Result: Column

Aggregate function: returns the level of grouping, equals to

2.0.0

The list of columns should match with grouping columns exactly, or empty (means all the
grouping columns).

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.390Z

raw docstring

hash^clj

(hash & exprs)

Params: (cols: Column*)

Result: Column

Calculates the hash code of given columns, and returns the result as an int column.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.391Z

Params: (cols: Column*)

Result: Column

Calculates the hash code of given columns, and returns the result as an int column.


2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.391Z

raw docstring

hash-code^clj

(hash-code expr)

Params: ()

Result: Int

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.918Z

Params: ()

Result: Int



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.918Z

raw docstring

head^clj

(head dataframe)

(head dataframe n-rows)

Params: (n: Int)

Result: Array[T]

Returns the first n rows.

1.6.0

this method should only be used if the resulting array is expected to be small, as all the data is loaded into the driver's memory.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.834Z

Params: (n: Int)

Result: Array[T]

Returns the first n rows.


1.6.0

this method should only be used if the resulting array is expected to be small, as
all the data is loaded into the driver's memory.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.834Z

raw docstring

head-vals^clj

(head-vals dataframe)

(head-vals dataframe n-rows)

Returns the vector values of the first n rows in the Dataset collected.

Returns the vector values of the first n rows in the Dataset collected.

raw docstring

hex^clj

(hex expr)

Params: (column: Column)

Result: Column

Computes hex value of the given column.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.393Z

Params: (column: Column)

Result: Column

Computes hex value of the given column.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.393Z

raw docstring

hint^clj

(hint dataframe hint-name & args)

Params: (name: String, parameters: Any*)

Result: Dataset[T]

Specifies some hint on the current Dataset. As an example, the following code specifies that one of the plan can be broadcasted:

2.2.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.835Z

Params: (name: String, parameters: Any*)

Result: Dataset[T]

Specifies some hint on the current Dataset. As an example, the following code specifies
that one of the plan can be broadcasted:

2.2.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.835Z

raw docstring

hour^clj

(hour expr)

Params: (e: Column)

Result: Column

Extracts the hours as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.394Z

Params: (e: Column)

Result: Column

Extracts the hours as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.394Z

raw docstring

hypot^clj

(hypot left-expr right-expr)

Params: (l: Column, r: Column)

Result: Column

Computes sqrt(a2 + b2) without intermediate overflow or underflow.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.406Z

Params: (l: Column, r: Column)

Result: Column

Computes sqrt(a2 + b2) without intermediate overflow or underflow.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.406Z

raw docstring

if^clj

(if condition if-expr)

(if condition if-expr else-expr)

Params: (condition: Column, value: Any)

Result: Column

Evaluates a list of conditions and returns one of multiple possible result expressions. If otherwise is not defined at the end, null is returned for unmatched conditions.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.724Z

Params: (condition: Column, value: Any)

Result: Column

Evaluates a list of conditions and returns one of multiple possible result expressions.
If otherwise is not defined at the end, null is returned for unmatched conditions.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.724Z

raw docstring

inc^clj

(inc expr)

Returns an expression one greater than expr.

Returns an expression one greater than `expr`.

raw docstring

initcap^clj

(initcap expr)

Params: (e: Column)

Result: Column

Returns a new string column by converting the first letter of each word to uppercase. Words are delimited by whitespace.

For example, "hello world" will become "Hello World".

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.407Z

Params: (e: Column)

Result: Column

Returns a new string column by converting the first letter of each word to uppercase.
Words are delimited by whitespace.

For example, "hello world" will become "Hello World".


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.407Z

raw docstring

input-file-name^clj

(input-file-name)

Params: ()

Result: Column

Creates a string column for the file name of the current Spark task.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.408Z

Params: ()

Result: Column

Creates a string column for the file name of the current Spark task.


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.408Z

raw docstring

input-files^clj

(input-files dataframe)

Params:

Result: Array[String]

Returns a best-effort snapshot of the files that compose this Dataset. This method simply asks each constituent BaseRelation for its respective files and takes the union of all results. Depending on the source relations, this may not find all input files. Duplicates are removed.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.837Z

Params: 

Result: Array[String]

Returns a best-effort snapshot of the files that compose this Dataset. This method simply
asks each constituent BaseRelation for its respective files and takes the union of all results.
Depending on the source relations, this may not find all input files. Duplicates are removed.


2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.837Z

raw docstring

instr^clj

(instr expr substr)

Params: (str: Column, substring: String)

Result: Column

Locate the position of the first occurrence of substr column in the given string. Returns null if either of the arguments are null.

1.5.0

The position is not zero based, but 1 based index. Returns 0 if substr could not be found in str.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.409Z

Params: (str: Column, substring: String)

Result: Column

Locate the position of the first occurrence of substr column in the given string.
Returns null if either of the arguments are null.


1.5.0

The position is not zero based, but 1 based index. Returns 0 if substr
could not be found in str.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.409Z

raw docstring

int^clj

(int expr)

Casts the column to an int.

Casts the column to an int.

raw docstring

interquartile-range^cljmultimethod

Column: Aggregate function: returns the inter-quartile range of the values in a group.

RelationalGroupedDataset: Compute the inter-quartile range for each numeric columns for each group.

Column: Aggregate function: returns the inter-quartile range of the values in a group.

RelationalGroupedDataset: Compute the inter-quartile range for each numeric columns for each group.

raw docstring

intersect^clj

(intersect dataframe other)

Params: (other: Dataset[T])

Result: Dataset[T]

Returns a new Dataset containing rows only in both this Dataset and another Dataset. This is equivalent to INTERSECT in SQL.

1.6.0

Equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom equals function defined on T.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.838Z

Params: (other: Dataset[T])

Result: Dataset[T]

Returns a new Dataset containing rows only in both this Dataset and another Dataset.
This is equivalent to INTERSECT in SQL.


1.6.0

Equality checking is performed directly on the encoded representation of the data
and thus is not affected by a custom equals function defined on T.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.838Z

raw docstring

intersect-all^clj

(intersect-all dataframe other)

Params: (other: Dataset[T])

Result: Dataset[T]

Returns a new Dataset containing rows only in both this Dataset and another Dataset while preserving the duplicates. This is equivalent to INTERSECT ALL in SQL.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.839Z

Params: (other: Dataset[T])

Result: Dataset[T]

Returns a new Dataset containing rows only in both this Dataset and another Dataset while
preserving the duplicates.
This is equivalent to INTERSECT ALL in SQL.


2.4.0

Equality checking is performed directly on the encoded representation of the data
and thus is not affected by a custom equals function defined on T. Also as standard
in SQL, this function resolves columns by position (not by name).

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.839Z

raw docstring

iqr^cljmultimethod

Column: Aggregate function: returns the inter-quartile range of the values in a group.

RelationalGroupedDataset: Compute the inter-quartile range for each numeric columns for each group.

Column: Aggregate function: returns the inter-quartile range of the values in a group.

RelationalGroupedDataset: Compute the inter-quartile range for each numeric columns for each group.

raw docstring

is-compatible^clj

(is-compatible bloom other)

Params: (other: BloomFilter)

Result: Boolean

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/BloomFilter.html

Timestamp: 2020-10-19T01:56:25.740Z

Params: (other: BloomFilter)

Result: Boolean



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/BloomFilter.html

Timestamp: 2020-10-19T01:56:25.740Z

raw docstring

is-empty^clj

(is-empty dataframe)

Params:

Result: Boolean

Returns true if the Dataset is empty.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.840Z

Params: 

Result: Boolean

Returns true if the Dataset is empty.


2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.840Z

raw docstring

is-in-collection^clj

(is-in-collection expr coll)

Params: (values: Iterable[_])

Result: Column

A boolean expression that is evaluated to true if the value of this expression is contained by the provided collection.

Note: Since the type of the elements in the collection are inferred only during the run time, the elements will be "up-casted" to the most common type for comparison. For eg:

In the case of "Int vs String", the "Int" will be up-casted to "String" and the comparison will look like "String vs String".
In the case of "Float vs Double", the "Float" will be up-casted to "Double" and the comparison will look like "Double vs Double"

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.924Z

Params: (values: Iterable[_])

Result: Column

A boolean expression that is evaluated to true if the value of this expression is contained
by the provided collection.

Note: Since the type of the elements in the collection are inferred only during the run time,
the elements will be "up-casted" to the most common type for comparison.
For eg:
  1) In the case of "Int vs String", the "Int" will be up-casted to "String" and the
comparison will look like "String vs String".
  2) In the case of "Float vs Double", the "Float" will be up-casted to "Double" and the
comparison will look like "Double vs Double"


2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.924Z

raw docstring

is-local^clj

(is-local dataframe)

Params:

Result: Boolean

Returns true if the collect and take methods can be run locally (without any Spark executors).

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.843Z

Params: 

Result: Boolean

Returns true if the collect and take methods can be run locally
(without any Spark executors).


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.843Z

raw docstring

is-nan^clj

(is-nan expr)

Params:

Result: Column

True if the current expression is NaN.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.927Z

Params: 

Result: Column

True if the current expression is NaN.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.927Z

raw docstring

is-not-null^clj

(is-not-null expr)

Params:

Result: Column

True if the current expression is NOT null.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.932Z

Params: 

Result: Column

True if the current expression is NOT null.


1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.932Z

raw docstring

is-null^clj

(is-null expr)

Params:

Result: Column

True if the current expression is null.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.933Z

Params: 

Result: Column

True if the current expression is null.


1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.933Z

raw docstring

is-streaming^clj

(is-streaming dataframe)

Params:

Result: Boolean

Returns true if this Dataset contains one or more sources that continuously return data as it arrives. A Dataset that reads data from a streaming source must be executed as a StreamingQuery using the start() method in DataStreamWriter. Methods that return a single answer, e.g. count() or collect(), will throw an AnalysisException when there is a streaming source present.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.844Z

Params: 

Result: Boolean

Returns true if this Dataset contains one or more sources that continuously
return data as it arrives. A Dataset that reads data from a streaming source
must be executed as a StreamingQuery using the start() method in
DataStreamWriter. Methods that return a single answer, e.g. count() or
collect(), will throw an AnalysisException when there is a streaming
source present.


2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.844Z

raw docstring

isin^clj

(isin expr coll)

Params: (list: Any*)

Result: Column

A boolean expression that is evaluated to true if the value of this expression is contained by the evaluated values of the arguments.

Note: Since the type of the elements in the list are inferred only during the run time, the elements will be "up-casted" to the most common type for comparison. For eg:

In the case of "Int vs String", the "Int" will be up-casted to "String" and the comparison will look like "String vs String".
In the case of "Float vs Double", the "Float" will be up-casted to "Double" and the comparison will look like "Double vs Double"

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.936Z

Params: (list: Any*)

Result: Column

A boolean expression that is evaluated to true if the value of this expression is contained
by the evaluated values of the arguments.

Note: Since the type of the elements in the list are inferred only during the run time,
the elements will be "up-casted" to the most common type for comparison.
For eg:
  1) In the case of "Int vs String", the "Int" will be up-casted to "String" and the
comparison will look like "String vs String".
  2) In the case of "Float vs Double", the "Float" will be up-casted to "Double" and the
comparison will look like "Double vs Double"


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.936Z

raw docstring

jars^clj

(jars)

(jars spark)

Params:

Result: List[String]

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.532Z

Params: 

Result: List[String]



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.532Z

raw docstring

java-spark-context^clj

(java-spark-context spark)

Converts a SparkSession to a JavaSparkContext.

Converts a SparkSession to a JavaSparkContext.

raw docstring

join^clj

(join left right expr)

(join left right expr join-type)

Params: (right: Dataset[_])

Result: DataFrame

Join with another DataFrame.

Behaves as an INNER JOIN and requires a subsequent join predicate.

Right side of the join operation.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.856Z

Params: (right: Dataset[_])

Result: DataFrame

Join with another DataFrame.

Behaves as an INNER JOIN and requires a subsequent join predicate.


Right side of the join operation.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.856Z

raw docstring

join-with^clj

(join-with left right condition)

(join-with left right condition join-type)

Params: (other: Dataset[U], condition: Column, joinType: String)

Result: Dataset[(T, U)]

Joins this Dataset returning a Tuple2 for each pair where condition evaluates to true.

This is similar to the relation join function with one important difference in the result schema. Since joinWith preserves objects present on either side of the join, the result schema is similarly nested into a tuple under the column names _1 and _2.

This type of join can be useful both for preserving type-safety with the original object types as well as working with relational data where either side of the join has column names in common.

Right side of the join.

Join expression.

Type of join to perform. Default inner. Must be one of: inner, cross, outer, full, fullouter,full_outer, left, leftouter, left_outer, right, rightouter, right_outer.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.860Z

Params: (other: Dataset[U], condition: Column, joinType: String)

Result: Dataset[(T, U)]

Joins this Dataset returning a Tuple2 for each pair where condition evaluates to
true.

This is similar to the relation join function with one important difference in the
result schema. Since joinWith preserves objects present on either side of the join, the
result schema is similarly nested into a tuple under the column names _1 and _2.

This type of join can be useful both for preserving type-safety with the original object
types as well as working with relational data where either side of the join has column
names in common.


Right side of the join.

Join expression.

Type of join to perform. Default inner. Must be one of:
                inner, cross, outer, full, fullouter,full_outer, left,
                leftouter, left_outer, right, rightouter, right_outer.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.860Z

raw docstring

keys^clj

(keys expr)

Params: (e: Column)

Result: Column

Returns an unordered array containing the keys of the map.

2.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.472Z

Params: (e: Column)

Result: Column

Returns an unordered array containing the keys of the map.

2.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.472Z

raw docstring

kurtosis^clj

(kurtosis expr)

Params: (e: Column)

Result: Column

Aggregate function: returns the kurtosis of the values in a group.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.416Z

Params: (e: Column)

Result: Column

Aggregate function: returns the kurtosis of the values in a group.


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.416Z

raw docstring

lag^clj

(lag expr offset)

(lag expr offset default)

Params: (e: Column, offset: Int)

Result: Column

Window function: returns the value that is offset rows before the current row, and null if there is less than offset rows before the current row. For example, an offset of one will return the previous row at any given point in the window partition.

This is equivalent to the LAG function in SQL.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.421Z

Params: (e: Column, offset: Int)

Result: Column

Window function: returns the value that is offset rows before the current row, and
null if there is less than offset rows before the current row. For example,
an offset of one will return the previous row at any given point in the window partition.

This is equivalent to the LAG function in SQL.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.421Z

raw docstring

last^cljmultimethod

Column: Aggregate function: returns the last value of the column in a group.

Dataset: Returns the last row.

Column: Aggregate function: returns the last value of the column in a group.

Dataset: Returns the last row.

raw docstring

last-day^clj

(last-day expr)

Params: (e: Column)

Result: Column

Returns the last day of the month which the given date belongs to. For example, input "2015-07-27" returns "2015-07-31" since July 31 is the last day of the month in July 2015.

A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A date, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.431Z

Params: (e: Column)

Result: Column

Returns the last day of the month which the given date belongs to.
For example, input "2015-07-27" returns "2015-07-31" since July 31 is the last day of the
month in July 2015.


A date, timestamp or string. If a string, the data must be in a format that can be
         cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A date, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.431Z

raw docstring

last-vals^clj

(last-vals dataframe)

Returns the vector values of the last row in the Dataset collected.

Returns the vector values of the last row in the Dataset collected.

raw docstring

lead^clj

(lead expr offset)

(lead expr offset default)

Params: (columnName: String, offset: Int)

Result: Column

Window function: returns the value that is offset rows after the current row, and null if there is less than offset rows after the current row. For example, an offset of one will return the next row at any given point in the window partition.

This is equivalent to the LEAD function in SQL.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.437Z

Params: (columnName: String, offset: Int)

Result: Column

Window function: returns the value that is offset rows after the current row, and
null if there is less than offset rows after the current row. For example,
an offset of one will return the next row at any given point in the window partition.

This is equivalent to the LEAD function in SQL.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.437Z

raw docstring

least^clj

(least & exprs)

Params: (exprs: Column*)

Result: Column

Returns the least value of the list of values, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.439Z

Params: (exprs: Column*)

Result: Column

Returns the least value of the list of values, skipping null values.
This function takes at least 2 parameters. It will return null iff all parameters are null.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.439Z

raw docstring

length^clj

(length expr)

Params: (e: Column)

Result: Column

Computes the character length of a given string or number of bytes of a binary string. The length of character strings include the trailing spaces. The length of binary strings includes binary zeros.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.440Z

Params: (e: Column)

Result: Column

Computes the character length of a given string or number of bytes of a binary string.
The length of character strings include the trailing spaces. The length of binary strings
includes binary zeros.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.440Z

raw docstring

levenshtein^clj

(levenshtein left-expr right-expr)

Params: (l: Column, r: Column)

Result: Column

Computes the Levenshtein distance of the two given string columns.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.441Z

Params: (l: Column, r: Column)

Result: Column

Computes the Levenshtein distance of the two given string columns.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.441Z

raw docstring

like^clj

(like expr literal)

Params: (literal: String)

Result: Column

SQL like expression. Returns a boolean column based on a SQL LIKE match.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.939Z

Params: (literal: String)

Result: Column

SQL like expression. Returns a boolean column based on a SQL LIKE match.


1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.939Z

raw docstring

limit^clj

(limit dataframe n-rows)

Params: (n: Int)

Result: Dataset[T]

Returns a new Dataset by taking the first n rows. The difference between this function and head is that head is an action and returns an array (by triggering query execution) while limit returns a new Dataset.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.861Z

Params: (n: Int)

Result: Dataset[T]

Returns a new Dataset by taking the first n rows. The difference between this function
and head is that head is an action and returns an array (by triggering query execution)
while limit returns a new Dataset.


2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.861Z

raw docstring

lit^clj

(lit arg)

Params: (literal: Any)

Result: Column

Creates a Column of literal value.

The passed in object is returned directly if it is already a Column. If the object is a Scala Symbol, it is converted into a Column also. Otherwise, a new Column is created to represent the literal value.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.442Z

Params: (literal: Any)

Result: Column

Creates a Column of literal value.

The passed in object is returned directly if it is already a Column.
If the object is a Scala Symbol, it is converted into a Column also.
Otherwise, a new Column is created to represent the literal value.


1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.442Z

raw docstring

local?^clj

(local? dataframe)

Params:

Result: Boolean

Returns true if the collect and take methods can be run locally (without any Spark executors).

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.843Z

Params: 

Result: Boolean

Returns true if the collect and take methods can be run locally
(without any Spark executors).


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.843Z

raw docstring

locate^clj

(locate substr expr)

Params: (substr: String, str: Column)

Result: Column

Locate the position of the first occurrence of substr.

1.5.0

The position is not zero based, but 1 based index. Returns 0 if substr could not be found in str.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.445Z

Params: (substr: String, str: Column)

Result: Column

Locate the position of the first occurrence of substr.


1.5.0

The position is not zero based, but 1 based index. Returns 0 if substr
could not be found in str.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.445Z

raw docstring

log^clj

(log expr)

Params: (e: Column)

Result: Column

Computes the natural logarithm of the given value.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.449Z

Params: (e: Column)

Result: Column

Computes the natural logarithm of the given value.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.449Z

raw docstring

log-10^clj

(log-10 expr)

Params: (e: Column)

Result: Column

Computes the logarithm of the given value in base 10.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.451Z

Params: (e: Column)

Result: Column

Computes the logarithm of the given value in base 10.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.451Z

raw docstring

log-1p^clj

(log-1p expr)

Params: (e: Column)

Result: Column

Computes the natural logarithm of the given value plus one.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.453Z

Params: (e: Column)

Result: Column

Computes the natural logarithm of the given value plus one.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.453Z

raw docstring

log-2^clj

(log-2 expr)

Params: (expr: Column)

Result: Column

Computes the logarithm of the given column in base 2.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.455Z

Params: (expr: Column)

Result: Column

Computes the logarithm of the given column in base 2.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.455Z

raw docstring

log10^clj

(log10 expr)

Params: (e: Column)

Result: Column

Computes the logarithm of the given value in base 10.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.451Z

Params: (e: Column)

Result: Column

Computes the logarithm of the given value in base 10.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.451Z

raw docstring

log1p^clj

(log1p expr)

Params: (e: Column)

Result: Column

Computes the natural logarithm of the given value plus one.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.453Z

Params: (e: Column)

Result: Column

Computes the natural logarithm of the given value plus one.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.453Z

raw docstring

log2^clj

(log2 expr)

Params: (expr: Column)

Result: Column

Computes the logarithm of the given column in base 2.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.455Z

Params: (expr: Column)

Result: Column

Computes the logarithm of the given column in base 2.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.455Z

raw docstring

long^clj

(long expr)

Casts the column to a long.

Casts the column to a long.

raw docstring

lower^clj

(lower expr)

Params: (e: Column)

Result: Column

Converts a string column to lower case.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.457Z

Params: (e: Column)

Result: Column

Converts a string column to lower case.


1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.457Z

raw docstring

lpad^clj

(lpad expr length pad)

Params: (str: Column, len: Int, pad: String)

Result: Column

Left-pad the string column with pad to a length of len. If the string column is longer than len, the return value is shortened to len characters.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.458Z

Params: (str: Column, len: Int, pad: String)

Result: Column

Left-pad the string column with pad to a length of len. If the string column is longer
than len, the return value is shortened to len characters.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.458Z

raw docstring

ltrim^clj

(ltrim expr)

Params: (e: Column)

Result: Column

Trim the spaces from left end for the specified string value.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.460Z

Params: (e: Column)

Result: Column

Trim the spaces from left end for the specified string value.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.460Z

raw docstring

map^clj

(map & exprs)

Params: (cols: Column*)

Result: Column

Creates a new map column. The input columns must be grouped as key-value pairs, e.g. (key1, value1, key2, value2, ...). The key columns must all have the same data type, and can't be null. The value columns must all have the same data type.

2.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.461Z

Params: (cols: Column*)

Result: Column

Creates a new map column. The input columns must be grouped as key-value pairs, e.g.
(key1, value1, key2, value2, ...). The key columns must all have the same data type, and can't
be null. The value columns must all have the same data type.


2.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.461Z

raw docstring

map->dataset^clj

(map->dataset map-of-values)

(map->dataset spark map-of-values)

Construct a Dataset from an associative map.

(g/show (g/map->dataset {:a [1 2], :b [3 4]}))
; +---+---+
; |a  |b  |
; +---+---+
; |1  |3  |
; |2  |4  |
; +---+---+

Construct a Dataset from an associative map.

```clojure
(g/show (g/map->dataset {:a [1 2], :b [3 4]}))
; +---+---+
; |a  |b  |
; +---+---+
; |1  |3  |
; |2  |4  |
; +---+---+
```

raw docstring

map-concat^clj

(map-concat & exprs)

Params: (cols: Column*)

Result: Column

Returns the union of all the given maps.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.462Z

Params: (cols: Column*)

Result: Column

Returns the union of all the given maps.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.462Z

raw docstring

map-entries^clj

(map-entries expr)

Params: (e: Column)

Result: Column

Returns an unordered array of all entries in the given map.

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.463Z

Params: (e: Column)

Result: Column

Returns an unordered array of all entries in the given map.

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.463Z

raw docstring

map-filter^clj

(map-filter expr predicate)

Params: (expr: Column, f: (Column, Column) ⇒ Column)

Result: Column

Returns a map whose key-value pairs satisfy a predicate.

the input map column

(key, value) => predicate, the Boolean predicate to filter the input map column

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.465Z

Params: (expr: Column, f: (Column, Column) ⇒ Column)

Result: Column

Returns a map whose key-value pairs satisfy a predicate.

the input map column

(key, value) => predicate, the Boolean predicate to filter the input map column

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.465Z

raw docstring

map-from-arrays^clj

(map-from-arrays key-expr val-expr)

Params: (keys: Column, values: Column)

Result: Column

Creates a new map column. The array in the first column is used for keys. The array in the second column is used for values. All elements in the array for key should not be null.

2.4

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.470Z

Params: (keys: Column, values: Column)

Result: Column

Creates a new map column. The array in the first column is used for keys. The array in the
second column is used for values. All elements in the array for key should not be null.


2.4

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.470Z

raw docstring

map-from-entries^clj

(map-from-entries expr)

Params: (e: Column)

Result: Column

Returns a map created from the given array of entries.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.471Z

Params: (e: Column)

Result: Column

Returns a map created from the given array of entries.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.471Z

raw docstring

map-keys^clj

(map-keys expr)

Params: (e: Column)

Result: Column

Returns an unordered array containing the keys of the map.

2.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.472Z

Params: (e: Column)

Result: Column

Returns an unordered array containing the keys of the map.

2.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.472Z

raw docstring

map-type^clj

(map-type key-type val-type)

Creates a MapType by specifying the data type of keys key-type, the data type of values val-type, and whether values contain any null value nullable.

Creates a MapType by specifying the data type of keys `key-type`, the data type
of values `val-type`, and whether values contain any null value `nullable`.

raw docstring

map-values^clj

(map-values expr)

Params: (e: Column)

Result: Column

Returns an unordered array containing the values of the map.

2.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.473Z

Params: (e: Column)

Result: Column

Returns an unordered array containing the values of the map.

2.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.473Z

raw docstring

map-zip-with^clj

(map-zip-with left right merge-fn)

Params: (left: Column, right: Column, f: (Column, Column, Column) ⇒ Column)

Result: Column

Merge two given maps, key-wise into a single map using a function.

the left input map column

the right input map column

(key, value1, value2) => new_value, the lambda function to merge the map values

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.474Z

Params: (left: Column, right: Column, f: (Column, Column, Column) ⇒ Column)

Result: Column

Merge two given maps, key-wise into a single map using a function.

the left input map column

the right input map column

(key, value1, value2) => new_value, the lambda function to merge the map values

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.474Z

raw docstring

master^clj

(master)

(master spark)

Params:

Result: String

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.532Z

Params: 

Result: String



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.532Z

raw docstring

max^cljmultimethod

Column: Aggregate function: returns the maximum value of the column in a group.

RelationalGroupedDataset: Compute the max value for each numeric columns for each group.

Column: Aggregate function: returns the maximum value of the column in a group.

RelationalGroupedDataset: Compute the max value for each numeric columns for each group.

raw docstring

md-5^clj

(md-5 expr)

Params: (e: Column)

Result: Column

Calculates the MD5 digest of a binary column and returns the value as a 32 character hex string.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.478Z

Params: (e: Column)

Result: Column

Calculates the MD5 digest of a binary column and returns the value
as a 32 character hex string.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.478Z

raw docstring

md5^clj

(md5 expr)

Params: (e: Column)

Result: Column

Calculates the MD5 digest of a binary column and returns the value as a 32 character hex string.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.478Z

Params: (e: Column)

Result: Column

Calculates the MD5 digest of a binary column and returns the value
as a 32 character hex string.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.478Z

raw docstring

mean^cljmultimethod

Column: Aggregate function: returns the average of the values in a group.

RelationalGroupedDataset: Compute the average value for each numeric columns for each group.

Column: Aggregate function: returns the average of the values in a group.

RelationalGroupedDataset: Compute the average value for each numeric columns for each group.

raw docstring

median^cljmultimethod

Column: Aggregate function: returns the median range of the values in a group.

RelationalGroupedDataset: Compute the median range for each numeric columns for each group.

Column: Aggregate function: returns the median range of the values in a group.

RelationalGroupedDataset: Compute the median range for each numeric columns for each group.

raw docstring

memory-and-disk^clj

Flag for controlling the storage of an RDD.

The default behavior of the DataFrame or Dataset. In this Storage Level, The DataFrame will be stored in JVM memory as deserialized objects. When required storage is greater than available memory, it stores some of the excess partitions into a disk and reads the data from disk when it required. It is slower as there is I/O involved.

Flag for controlling the storage of an RDD.

The default behavior of the DataFrame or Dataset. In this Storage Level, The DataFrame will be stored in JVM memory as deserialized objects. When required storage is greater than available memory, it stores some of the excess partitions into a disk and reads the data from disk when it required. It is slower as there is I/O involved.

raw docstring

memory-and-disk-2^clj

Flag for controlling the storage of an RDD.

Same as memory-and-disk storage level but replicate each partition to two cluster nodes.

Flag for controlling the storage of an RDD.

Same as memory-and-disk storage level but replicate each partition to two cluster nodes.

raw docstring

memory-and-disk-ser^clj

Flag for controlling the storage of an RDD.

Same as memory-and-disk storage level difference being it serializes the DataFrame objects in memory and on disk when space not available.

Flag for controlling the storage of an RDD.

Same as `memory-and-disk` storage level difference being it serializes the DataFrame objects in memory and on disk when space not available.

raw docstring

memory-and-disk-ser-2^clj

Flag for controlling the storage of an RDD.

Same as memory-and-disk-ser storage level but replicate each partition to two cluster nodes.

Flag for controlling the storage of an RDD.

Same as memory-and-disk-ser storage level but replicate each partition to two cluster nodes.

raw docstring

memory-only^clj

Flag for controlling the storage of an RDD.

Flag for controlling the storage of an RDD.

raw docstring

memory-only-2^clj

Flag for controlling the storage of an RDD.

Same as memory-only storage level but replicate each partition to two cluster nodes.

Flag for controlling the storage of an RDD.

Same as `memory-only` storage level but replicate each partition to two cluster nodes.

raw docstring

memory-only-ser^clj

Flag for controlling the storage of an RDD.

Same as memory-only but the difference being it stores RDD as serialized objects to JVM memory. It takes lesser memory (space-efficient) then memory-only as it saves objects as serialized and takes an additional few more CPU cycles in order to deserialize.

Flag for controlling the storage of an RDD.

Same as `memory-only` but the difference being it stores RDD as serialized objects to JVM memory. It takes lesser memory (space-efficient) then `memory-only` as it saves objects as serialized and takes an additional few more CPU cycles in order to deserialize.

raw docstring

memory-only-ser-2^clj

Flag for controlling the storage of an RDD.

Same as memory-only-ser storage level but replicate each partition to two cluster nodes.

Flag for controlling the storage of an RDD.

Same as `memory-only-ser` storage level but replicate each partition to two cluster nodes.

raw docstring

merge^clj

(merge expr & ms)

Variadic version of map-concat.

Variadic version of `map-concat`.

raw docstring

merge-in-place^clj

(merge-in-place bloom-or-cms other)

Params: (other: BloomFilter)

Result: BloomFilter

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/BloomFilter.html

Timestamp: 2020-10-19T01:56:25.741Z

Params: (other: BloomFilter)

Result: BloomFilter



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/BloomFilter.html

Timestamp: 2020-10-19T01:56:25.741Z

raw docstring

merge-with^clj

(merge-with left right merge-fn)

Params: (left: Column, right: Column, f: (Column, Column, Column) ⇒ Column)

Result: Column

Merge two given maps, key-wise into a single map using a function.

the left input map column

the right input map column

(key, value1, value2) => new_value, the lambda function to merge the map values

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.474Z

Params: (left: Column, right: Column, f: (Column, Column, Column) ⇒ Column)

Result: Column

Merge two given maps, key-wise into a single map using a function.

the left input map column

the right input map column

(key, value1, value2) => new_value, the lambda function to merge the map values

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.474Z

raw docstring

might-contain^clj

(might-contain bloom item)

Params: (item: Any)

Result: Boolean

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/BloomFilter.html

Timestamp: 2020-10-19T01:56:25.742Z

Params: (item: Any)

Result: Boolean



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/BloomFilter.html

Timestamp: 2020-10-19T01:56:25.742Z

raw docstring

min^cljmultimethod

Column: Aggregate function: returns the minimum value of the column in a group.

RelationalGroupedDataset: Compute the min value for each numeric columns for each group.

Column: Aggregate function: returns the minimum value of the column in a group.

RelationalGroupedDataset: Compute the min value for each numeric columns for each group.

raw docstring

minute^clj

(minute expr)

Params: (e: Column)

Result: Column

Extracts the minutes as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.483Z

Params: (e: Column)

Result: Column

Extracts the minutes as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.483Z

raw docstring

mod^clj

Params: (other: Any)

Result: Column

Modulo (a.k.a. remainder) expression.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.958Z

Params: (other: Any)

Result: Column

Modulo (a.k.a. remainder) expression.


1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.958Z

raw docstring

monotonically-increasing-id^clj

(monotonically-increasing-id)

Params: ()

Result: Column

A column expression that generates monotonically increasing 64-bit integers.

The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the record number within each partition in the lower 33 bits. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records.

As an example, consider a DataFrame with two partitions, each with 3 records. This expression would return the following IDs:

(Since version 2.0.0) Use monotonically_increasing_id()

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.744Z

Params: ()

Result: Column

A column expression that generates monotonically increasing 64-bit integers.

The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive.
The current implementation puts the partition ID in the upper 31 bits, and the record number
within each partition in the lower 33 bits. The assumption is that the data frame has
less than 1 billion partitions, and each partition has less than 8 billion records.

As an example, consider a DataFrame with two partitions, each with 3 records.
This expression would return the following IDs:

(Since version 2.0.0) Use monotonically_increasing_id()

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.744Z

raw docstring

month^clj

(month expr)

Params: (e: Column)

Result: Column

Extracts the month as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.486Z

Params: (e: Column)

Result: Column

Extracts the month as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.486Z

raw docstring

months-between^clj

(months-between l-expr r-expr)

Params: (end: Column, start: Column)

Result: Column

Returns number of months between dates start and end.

A whole number is returned if both inputs have the same day of month or both are the last day of their respective months. Otherwise, the difference is calculated assuming 31 days per month.

For example:

A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A date, timestamp or string. If a string, the data must be in a format that can cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A double, or null if either end or start were strings that could not be cast to a timestamp. Negative if end is before start

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.490Z

Params: (end: Column, start: Column)

Result: Column

Returns number of months between dates start and end.

A whole number is returned if both inputs have the same day of month or both are the last day
of their respective months. Otherwise, the difference is calculated assuming 31 days per month.

For example:

A date, timestamp or string. If a string, the data must be in a format that can
             be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A date, timestamp or string. If a string, the data must be in a format that can
             cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A double, or null if either end or start were strings that could not be cast to a
        timestamp. Negative if end is before start

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.490Z

raw docstring

name-value-seq->dataset^clj

(name-value-seq->dataset map-of-values)

(name-value-seq->dataset spark map-of-values)

Construct a Dataset from an associative map.

(g/show (g/map->dataset {:a [1 2], :b [3 4]}))
; +---+---+
; |a  |b  |
; +---+---+
; |1  |3  |
; |2  |4  |
; +---+---+

Construct a Dataset from an associative map.

```clojure
(g/show (g/map->dataset {:a [1 2], :b [3 4]}))
; +---+---+
; |a  |b  |
; +---+---+
; |1  |3  |
; |2  |4  |
; +---+---+
```

raw docstring

nan?^clj

(nan? expr)

Params:

Result: Column

True if the current expression is NaN.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.927Z

Params: 

Result: Column

True if the current expression is NaN.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.927Z

raw docstring

nanvl^clj

(nanvl left-expr right-expr)

Params: (col1: Column, col2: Column)

Result: Column

Returns col1 if it is not NaN, or col2 if col1 is NaN.

Both inputs should be floating point columns (DoubleType or FloatType).

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.492Z

Params: (col1: Column, col2: Column)

Result: Column

Returns col1 if it is not NaN, or col2 if col1 is NaN.

Both inputs should be floating point columns (DoubleType or FloatType).


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.492Z

raw docstring

neg?^clj

(neg? expr)

Returns true if expr is less than zero, else false.

Returns true if `expr` is less than zero, else false.

raw docstring

negate^clj

(negate expr)

Params: (e: Column)

Result: Column

Unary minus, i.e. negate the expression.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.494Z

Params: (e: Column)

Result: Column

Unary minus, i.e. negate the expression.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.494Z

raw docstring

next-day^clj

(next-day expr day-of-week)

Params: (date: Column, dayOfWeek: String)

Result: Column

Returns the first date which is later than the value of the date column that is on the specified day of the week.

For example, next_day('2015-07-27', "Sunday") returns 2015-08-02 because that is the first Sunday after 2015-07-27.

A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

Case insensitive, and accepts: "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"

A date, or null if date was a string that could not be cast to a date or if dayOfWeek was an invalid value

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.495Z

Params: (date: Column, dayOfWeek: String)

Result: Column

Returns the first date which is later than the value of the date column that is on the
specified day of the week.

For example, next_day('2015-07-27', "Sunday") returns 2015-08-02 because that is the first
Sunday after 2015-07-27.


A date, timestamp or string. If a string, the data must be in a format that
                 can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

Case insensitive, and accepts: "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"

A date, or null if date was a string that could not be cast to a date or if
        dayOfWeek was an invalid value

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.495Z

raw docstring

nlargest^clj

(nlargest dataframe n-rows expr)

Return the Dataset with the first n-rows rows ordered by expr in descending order.

Return the Dataset with the first `n-rows` rows ordered by `expr` in descending order.

raw docstring

none^clj

Flag for controlling the storage of an RDD.

No caching.

Flag for controlling the storage of an RDD.

No caching.

raw docstring

not^clj

(not expr)

Params: (e: Column)

Result: Column

Inversion of boolean expression, i.e. NOT.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.497Z

Params: (e: Column)

Result: Column

Inversion of boolean expression, i.e. NOT.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.497Z

raw docstring

not-null?^clj

(not-null? expr)

Params:

Result: Column

True if the current expression is NOT null.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.932Z

Params: 

Result: Column

True if the current expression is NOT null.


1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.932Z

raw docstring

nsmallest^clj

(nsmallest dataframe n-rows expr)

Return the Dataset with the first n-rows rows ordered by expr in ascending order.

Return the Dataset with the first `n-rows` rows ordered by `expr` in ascending order.

raw docstring

ntile^clj

(ntile n)

Params: (n: Int)

Result: Column

Window function: returns the ntile group id (from 1 to n inclusive) in an ordered window partition. For example, if n is 4, the first quarter of the rows will get value 1, the second quarter will get 2, the third quarter will get 3, and the last quarter will get 4.

This is equivalent to the NTILE function in SQL.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.500Z

Params: (n: Int)

Result: Column

Window function: returns the ntile group id (from 1 to n inclusive) in an ordered window
partition. For example, if n is 4, the first quarter of the rows will get value 1, the second
quarter will get 2, the third quarter will get 3, and the last quarter will get 4.

This is equivalent to the NTILE function in SQL.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.500Z

raw docstring

null-count^clj

(null-count expr)

Aggregate function: returns the null count of a column.

Aggregate function: returns the null count of a column.

raw docstring

null-rate^clj

(null-rate expr)

Aggregate function: returns the null rate of a column.

Aggregate function: returns the null rate of a column.

raw docstring

null?^clj

(null? expr)

Params:

Result: Column

True if the current expression is null.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.933Z

Params: 

Result: Column

True if the current expression is null.


1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.933Z

raw docstring

nunique^clj

(nunique dataframe)

Count distinct observations over all columns in the Dataset.

Count distinct observations over all columns in the Dataset.

raw docstring

odd?^clj

(odd? expr)

Returns true if expr is odd, else false.

Returns true if `expr` is odd, else false.

raw docstring

off-heap^clj

Flag for controlling the storage of an RDD.

Off-heap refers to objects (serialised to byte array) that are managed by the operating system but stored outside the process heap in native memory (therefore, they are not processed by the garbage collector). Accessing this data is slightly slower than accessing the on-heap storage but still faster than reading/writing from a disk. The downside is that the user has to manually deal with managing the allocated memory.

Flag for controlling the storage of an RDD.

Off-heap refers to objects (serialised to byte array) that are managed by the operating system but stored outside the process heap in native memory (therefore, they are not processed by the garbage collector). Accessing this data is slightly slower than accessing the on-heap storage but still faster than reading/writing from a disk. The downside is that the user has to manually deal with managing the allocated memory.

raw docstring

order-by^clj

(order-by dataframe & exprs)

Params: (sortCol: String, sortCols: String*)

Result: Dataset[T]

Returns a new Dataset sorted by the given expressions. This is an alias of the sort function.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.884Z

Params: (sortCol: String, sortCols: String*)

Result: Dataset[T]

Returns a new Dataset sorted by the given expressions.
This is an alias of the sort function.


2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.884Z

raw docstring

over^clj

(over column window-spec)

Params: (window: WindowSpec)

Result: Column

Defines a windowing column.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.973Z

Params: (window: WindowSpec)

Result: Column

Defines a windowing column.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.973Z

raw docstring

overlay^clj

(overlay src rep pos)

(overlay src rep pos len)

Params: (src: Column, replace: Column, pos: Column, len: Column)

Result: Column

Overlay the specified portion of src with replace, starting from byte position pos of src and proceeding for len bytes.

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.503Z

Params: (src: Column, replace: Column, pos: Column, len: Column)

Result: Column

Overlay the specified portion of src with replace,
 starting from byte position pos of src and proceeding for len bytes.


3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.503Z

raw docstring

partitions^clj

(partitions dataframe)

Params:

Result: List[Partition]

Set of partitions in this RDD.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.891Z

Params: 

Result: List[Partition]

Set of partitions in this RDD.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.891Z

raw docstring

percent-rank^clj

(percent-rank)

Params: ()

Result: Column

Window function: returns the relative rank (i.e. percentile) of rows within a window partition.

This is computed by:

This is equivalent to the PERCENT_RANK function in SQL.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.504Z

Params: ()

Result: Column

Window function: returns the relative rank (i.e. percentile) of rows within a window partition.

This is computed by:

This is equivalent to the PERCENT_RANK function in SQL.


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.504Z

raw docstring

persist^clj

(persist dataframe)

(persist dataframe new-level)

Params: ()

Result: Dataset.this.type

Persist this Dataset with the default storage level (MEMORY_AND_DISK).

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.886Z

Params: ()

Result: Dataset.this.type

Persist this Dataset with the default storage level (MEMORY_AND_DISK).


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.886Z

raw docstring

pi^clj

The double value that is closer than any other to pi, the ratio of the circumference of a circle to its diameter.

The double value that is closer than any other to pi, the ratio of the circumference of a circle to its diameter.

raw docstring

pivot^clj

(pivot grouped expr)

(pivot grouped expr values)

Params: (pivotColumn: String)

Result: RelationalGroupedDataset

Pivots a column of the current DataFrame and performs the specified aggregation.

There are two versions of pivot function: one that requires the caller to specify the list of distinct values to pivot on, and one that does not. The latter is more concise but less efficient, because Spark needs to first compute the list of distinct values internally.

Name of the column to pivot.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/RelationalGroupedDataset.html

Timestamp: 2020-10-19T01:56:23.317Z

Params: (pivotColumn: String)

Result: RelationalGroupedDataset

Pivots a column of the current DataFrame and performs the specified aggregation.

There are two versions of pivot function: one that requires the caller to specify the list
of distinct values to pivot on, and one that does not. The latter is more concise but less
efficient, because Spark needs to first compute the list of distinct values internally.

Name of the column to pivot.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/RelationalGroupedDataset.html

Timestamp: 2020-10-19T01:56:23.317Z

raw docstring

pmod^clj

(pmod left-expr right-expr)

Params: (dividend: Column, divisor: Column)

Result: Column

Returns the positive value of dividend mod divisor.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.505Z

Params: (dividend: Column, divisor: Column)

Result: Column

Returns the positive value of dividend mod divisor.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.505Z

raw docstring

pos?^clj

(pos? expr)

Returns true if expr is greater than zero, else false.

Returns true if `expr` is greater than zero, else false.

raw docstring

posexplode^clj

(posexplode expr)

Params: (e: Column)

Result: Column

Creates a new row for each element with position in the given array or map column. Uses the default column name pos for position, and col for elements in the array and key and value for elements in the map unless specified otherwise.

2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.506Z

Params: (e: Column)

Result: Column

Creates a new row for each element with position in the given array or map column.
Uses the default column name pos for position, and col for elements in the array
and key and value for elements in the map unless specified otherwise.


2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.506Z

raw docstring

posexplode-outer^clj

(posexplode-outer expr)

Params: (e: Column)

Result: Column

2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.506Z

Params: (e: Column)

Result: Column

Creates a new row for each element with position in the given array or map column.
Uses the default column name pos for position, and col for elements in the array
and key and value for elements in the map unless specified otherwise.


2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.506Z

raw docstring

pow^clj

(pow base exponent)

Params: (l: Column, r: Column)

Result: Column

Returns the value of the first argument raised to the power of the second argument.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.520Z

Params: (l: Column, r: Column)

Result: Column

Returns the value of the first argument raised to the power of the second argument.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.520Z

raw docstring

print-schema^clj

(print-schema dataframe)

Params: ()

Result: Unit

Prints the schema to the console in a nice tree format.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.888Z

Params: ()

Result: Unit

Prints the schema to the console in a nice tree format.


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.888Z

raw docstring

put^clj

(put bloom item)

Params: (item: Any)

Result: Boolean

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/BloomFilter.html

Timestamp: 2020-10-19T01:56:25.746Z

Params: (item: Any)

Result: Boolean



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/BloomFilter.html

Timestamp: 2020-10-19T01:56:25.746Z

raw docstring

qcut^clj

(qcut expr num-buckets-or-probs)

Returns a new Column of discretised expr into equal-sized buckets based on rank or based on sample quantiles.

Returns a new Column of discretised `expr` into equal-sized buckets based
on rank or based on sample quantiles.

raw docstring

quantile^cljmultimethod

Column: Aggregate function: returns the quantile of the values in a group.

RelationalGroupedDataset: Compute the quantile for each numeric columns for each group.

Column: Aggregate function: returns the quantile of the values in a group.

RelationalGroupedDataset: Compute the quantile for each numeric columns for each group.

raw docstring

quarter^clj

(quarter expr)

Params: (e: Column)

Result: Column

Extracts the quarter as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.521Z

Params: (e: Column)

Result: Column

Extracts the quarter as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.521Z

raw docstring

radians^clj

(radians expr)

Params: (e: Column)

Result: Column

Converts an angle measured in degrees to an approximately equivalent angle measured in radians.

angle in degrees

angle in radians, as if computed by java.lang.Math.toRadians

2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.523Z

Params: (e: Column)

Result: Column

Converts an angle measured in degrees to an approximately equivalent angle measured in radians.


angle in degrees

angle in radians, as if computed by java.lang.Math.toRadians

2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.523Z

raw docstring

rand^clj

(rand)

(rand seed)

Params: (seed: Long)

Result: Column

Generate a random column with independent and identically distributed (i.i.d.) samples uniformly distributed in [0.0, 1.0).

1.4.0

The function is non-deterministic in general case.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.526Z

Params: (seed: Long)

Result: Column

Generate a random column with independent and identically distributed (i.i.d.) samples
uniformly distributed in [0.0, 1.0).


1.4.0

The function is non-deterministic in general case.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.526Z

raw docstring

rand-nth^clj

(rand-nth dataframe)

Returns a random row collected.

Returns a random row collected.

raw docstring

randn^clj

(randn)

(randn seed)

Params: (seed: Long)

Result: Column

Generate a column with independent and identically distributed (i.i.d.) samples from the standard normal distribution.

1.4.0

The function is non-deterministic in general case.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.528Z

Params: (seed: Long)

Result: Column

Generate a column with independent and identically distributed (i.i.d.) samples from
the standard normal distribution.


1.4.0

The function is non-deterministic in general case.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.528Z

raw docstring

random-choice^clj

(random-choice choices)

(random-choice choices probs)

(random-choice choices probs seed)

Returns a new Column of a random sample from a given collection of choices.

Returns a new Column of a random sample from a given collection of `choices`.

raw docstring

random-exp^clj

(random-exp)

(random-exp rate)

(random-exp rate seed)

Returns a new Column of draws from an exponential distribution.

Returns a new Column of draws from an exponential distribution.

raw docstring

random-int^clj

(random-int)

(random-int low high)

(random-int low high seed)

Returns a new Column of random integers from low (inclusive) to high (exclusive).

Returns a new Column of random integers from `low` (inclusive) to `high` (exclusive).

raw docstring

random-norm^clj

(random-norm)

(random-norm mu sigma)

(random-norm mu sigma seed)

Returns a new Column of draws from a normal distribution.

Returns a new Column of draws from a normal distribution.

raw docstring

random-split^clj

(random-split dataframe weights)

(random-split dataframe weights seed)

Params: (weights: Array[Double], seed: Long)

Result: Array[Dataset[T]]

Randomly splits this Dataset with the provided weights.

weights for splits, will be normalized if they don't sum to 1.

Seed for sampling. For Java API, use randomSplitAsList.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.892Z

Params: (weights: Array[Double], seed: Long)

Result: Array[Dataset[T]]

Randomly splits this Dataset with the provided weights.


weights for splits, will be normalized if they don't sum to 1.

Seed for sampling.
For Java API, use randomSplitAsList.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.892Z

raw docstring

random-uniform^clj

(random-uniform)

(random-uniform low high)

(random-uniform low high seed)

Returns a new Column of draws from a uniform distribution.

Returns a new Column of draws from a uniform distribution.

raw docstring

range^cljmultimethod

Creates a Dataset with a single LongType column named id.

The Dataset contains elements in a range from start (default 0) to end (exclusive) with the given step (default 1).

If num-partitions is specified, the dataset will be distributed into the specified number of partitions. Otherwise, spark uses internal logic to determine the number of partitions.

Creates a `Dataset` with a single `LongType` column named `id`.

The `Dataset` contains elements in a range from `start` (default 0) to `end` (exclusive)
with the given `step` (default 1).

If `num-partitions` is specified, the dataset will be distributed into the specified number
of partitions. Otherwise, spark uses internal logic to determine the number of partitions.

raw docstring

rank^clj

(rank)

Params: ()

Result: Column

Window function: returns the rank of rows within a window partition.

The difference between rank and dense_rank is that dense_rank leaves no gaps in ranking sequence when there are ties. That is, if you were ranking a competition using dense_rank and had three people tie for second place, you would say that all three were in second place and that the next person came in third. Rank would give me sequential numbers, making the person that came in third place (after the ties) would register as coming in fifth.

This is equivalent to the RANK function in SQL.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.529Z

Params: ()

Result: Column

Window function: returns the rank of rows within a window partition.

The difference between rank and dense_rank is that dense_rank leaves no gaps in ranking
sequence when there are ties. That is, if you were ranking a competition using dense_rank
and had three people tie for second place, you would say that all three were in second
place and that the next person came in third. Rank would give me sequential numbers, making
the person that came in third place (after the ties) would register as coming in fifth.

This is equivalent to the RANK function in SQL.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.529Z

raw docstring

rchoice^clj

(rchoice choices)

(rchoice choices probs)

(rchoice choices probs seed)

Returns a new Column of a random sample from a given collection of choices.

Returns a new Column of a random sample from a given collection of `choices`.

raw docstring

rdd^clj

(rdd dataframe)

Params:

Result: RDD[T]

Represents the content of the Dataset as an RDD of T.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.894Z

Params: 

Result: RDD[T]

Represents the content of the Dataset as an RDD of T.


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.894Z

raw docstring

read-avro!^cljmultimethod

Loads an Avro file and returns the results as a DataFrame.

Spark's DataFrameReader options may be passed in as a map of options.

See: https://spark.apache.org/docs/latest/sql-data-sources.html

Loads an Avro file and returns the results as a DataFrame.

Spark's DataFrameReader options may be passed in as a map of options.

See: https://spark.apache.org/docs/latest/sql-data-sources.html

raw docstring

read-binary!^cljmultimethod

Loads a binary file and returns the results as a DataFrame.

Spark's DataFrameReader options may be passed in as a map of options.

See: https://spark.apache.org/docs/latest/sql-data-sources-binaryFile.html

Loads a binary file and returns the results as a DataFrame.

Spark's DataFrameReader options may be passed in as a map of options.

See: https://spark.apache.org/docs/latest/sql-data-sources-binaryFile.html

raw docstring

read-csv!^cljmultimethod

Loads a CSV file and returns the results as a DataFrame.

Spark's DataFrameReader options may be passed in as a map of options.

See: https://spark.apache.org/docs/latest/sql-data-sources.html

Loads a CSV file and returns the results as a DataFrame.

Spark's DataFrameReader options may be passed in as a map of options.

See: https://spark.apache.org/docs/latest/sql-data-sources.html

raw docstring

read-edn!^cljmultimethod

Loads an EDN file and returns the results as a DataFrame.

Loads an EDN file and returns the results as a DataFrame.

raw docstring

read-jdbc!^clj

(read-jdbc! options)

(read-jdbc! spark options)

Loads a database table and returns the results as a DataFrame.

Spark's DataFrameReader options may be passed in as a map of options.

See: https://spark.apache.org/docs/latest/sql-data-sources.html

Loads a database table and returns the results as a DataFrame.

Spark's DataFrameReader options may be passed in as a map of options.

See: https://spark.apache.org/docs/latest/sql-data-sources.html

raw docstring

read-json!^cljmultimethod

Loads a JSON file and returns the results as a DataFrame.

Spark's DataFrameReader options may be passed in as a map of options.

See: https://spark.apache.org/docs/latest/sql-data-sources.html

Loads a JSON file and returns the results as a DataFrame.

Spark's DataFrameReader options may be passed in as a map of options.

See: https://spark.apache.org/docs/latest/sql-data-sources.html

raw docstring

read-libsvm!^cljmultimethod

Loads a LIBSVM file and returns the results as a DataFrame.

Spark's DataFrameReader options may be passed in as a map of options.

See: https://spark.apache.org/docs/latest/sql-data-sources.html

Loads a LIBSVM file and returns the results as a DataFrame.

Spark's DataFrameReader options may be passed in as a map of options.

See: https://spark.apache.org/docs/latest/sql-data-sources.html

raw docstring

read-parquet!^cljmultimethod

Loads a Parquet file and returns the results as a DataFrame.

Spark's DataFrameReader options may be passed in as a map of options.

See: https://spark.apache.org/docs/latest/sql-data-sources-parquet.html

Loads a Parquet file and returns the results as a DataFrame.

Spark's DataFrameReader options may be passed in as a map of options.

See: https://spark.apache.org/docs/latest/sql-data-sources-parquet.html

raw docstring

read-table!^cljmultimethod

Reads a managed (hive) table and returns the result as a DataFrame.

Reads a managed (hive) table and returns the result as a DataFrame.

raw docstring

read-text!^cljmultimethod

Loads a text file and returns the results as a DataFrame.

Spark's DataFrameReader options may be passed in as a map of options.

See: https://spark.apache.org/docs/latest/sql-data-sources.html

Loads a text file and returns the results as a DataFrame.

Spark's DataFrameReader options may be passed in as a map of options.

See: https://spark.apache.org/docs/latest/sql-data-sources.html

raw docstring

read-xlsx!^cljmultimethod

Loads an Excel file and returns the results as a DataFrame.

Example options:

{:header true :sheet "Sheet2"}

Loads an Excel file and returns the results as a DataFrame.

Example options:
```clojure
{:header true :sheet "Sheet2"}
```

raw docstring

records->dataset^clj

(records->dataset records)

(records->dataset spark records)

Construct a Dataset from a collection of maps.

(g/show (g/records->dataset [{:a 1 :b 2} {:a 3 :b 4}]))
; +---+---+
; |a  |b  |
; +---+---+
; |1  |2  |
; |3  |4  |
; +---+---+

Construct a Dataset from a collection of maps.

```clojure
(g/show (g/records->dataset [{:a 1 :b 2} {:a 3 :b 4}]))
; +---+---+
; |a  |b  |
; +---+---+
; |1  |2  |
; |3  |4  |
; +---+---+
```

raw docstring

regexp-extract^clj

(regexp-extract expr regex idx)

Params: (e: Column, exp: String, groupIdx: Int)

Result: Column

Extract a specific group matched by a Java regex, from the specified string column. If the regex did not match, or the specified group did not match, an empty string is returned.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.530Z

Params: (e: Column, exp: String, groupIdx: Int)

Result: Column

Extract a specific group matched by a Java regex, from the specified string column.
If the regex did not match, or the specified group did not match, an empty string is returned.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.530Z

raw docstring

regexp-replace^clj

(regexp-replace expr pattern-expr replacement-expr)

Params: (e: Column, pattern: String, replacement: String)

Result: Column

Replace all substrings of the specified string value that match regexp with rep.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.532Z

Params: (e: Column, pattern: String, replacement: String)

Result: Column

Replace all substrings of the specified string value that match regexp with rep.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.532Z

raw docstring

relative-error^clj

(relative-error cms)

Params: ()

Result: Double

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/CountMinSketch.html

Timestamp: 2020-10-19T01:56:26.106Z

Params: ()

Result: Double



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/CountMinSketch.html

Timestamp: 2020-10-19T01:56:26.106Z

raw docstring

remove^clj

(remove dataframe expr)

Returns a new Dataset that only contains elements where func returns false.

Returns a new Dataset that only contains elements where func returns false.

raw docstring

rename-columns^clj

(rename-columns dataframe rename-map)

Returns a new Dataset with a column renamed according to the rename-map.

Returns a new Dataset with a column renamed according to the rename-map.

raw docstring

rename-keys^clj

(rename-keys expr kmap)

Same as transform-keys with a map arg.

Same as `transform-keys` with a map arg.

raw docstring

repartition^clj

(repartition dataframe & args)

Params: (numPartitions: Int)

Result: Dataset[T]

Returns a new Dataset that has exactly numPartitions partitions.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.901Z

Params: (numPartitions: Int)

Result: Dataset[T]

Returns a new Dataset that has exactly numPartitions partitions.


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.901Z

raw docstring

repartition-by-range^clj

(repartition-by-range dataframe & args)

Params: (numPartitions: Int, partitionExprs: Column*)

Result: Dataset[T]

Returns a new Dataset partitioned by the given partitioning expressions into numPartitions. The resulting Dataset is range partitioned.

At least one partition-by expression must be specified. When no explicit sort order is specified, "ascending nulls first" is assumed. Note, the rows are not sorted in each partition of the resulting Dataset.

Note that due to performance reasons this method uses sampling to estimate the ranges. Hence, the output may not be consistent, since sampling can return different values. The sample size can be controlled by the config spark.sql.execution.rangeExchange.sampleSizePerPartition.

2.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.904Z

Params: (numPartitions: Int, partitionExprs: Column*)

Result: Dataset[T]

Returns a new Dataset partitioned by the given partitioning expressions into
numPartitions. The resulting Dataset is range partitioned.

At least one partition-by expression must be specified.
When no explicit sort order is specified, "ascending nulls first" is assumed.
Note, the rows are not sorted in each partition of the resulting Dataset.

Note that due to performance reasons this method uses sampling to estimate the ranges.
Hence, the output may not be consistent, since sampling can return different values.
The sample size can be controlled by the config
spark.sql.execution.rangeExchange.sampleSizePerPartition.


2.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.904Z

raw docstring

replace^clj

(replace expr lookup-map)

(replace expr from-value-or-values to-value)

Returns a new Column where from-value-or-values is replaced with to-value.

Returns a new Column where `from-value-or-values` is replaced with `to-value`.

raw docstring

replace-na^clj

(replace-na dataframe cols replacement)

Params: (col: String, replacement: Map[T, T])

Result: DataFrame

Replaces values matching keys in replacement map with the corresponding values.

name of the column to apply the value replacement. If col is "*", replacement is applied on all string, numeric or boolean columns.

value replacement map. Key and value of replacement map must have the same type, and can only be doubles, strings or booleans. The map value can have nulls.

1.3.1

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameNaFunctions.html

Timestamp: 2020-10-19T01:56:23.927Z

Params: (col: String, replacement: Map[T, T])

Result: DataFrame

Replaces values matching keys in replacement map with the corresponding values.

name of the column to apply the value replacement. If col is "*",
           replacement is applied on all string, numeric or boolean columns.

value replacement map. Key and value of replacement map must have
                   the same type, and can only be doubles, strings or booleans.
                   The map value can have nulls.

1.3.1

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameNaFunctions.html

Timestamp: 2020-10-19T01:56:23.927Z

raw docstring

resources^clj

(resources)

(resources spark)

Params:

Result: Map[String, ResourceInformation]

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.550Z

Params: 

Result: Map[String, ResourceInformation]



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.550Z

raw docstring

reverse^clj

(reverse expr)

Params: (e: Column)

Result: Column

Returns a reversed string or an array with reverse order of elements.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.534Z

Params: (e: Column)

Result: Column

Returns a reversed string or an array with reverse order of elements.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.534Z

raw docstring

rexp^clj

(rexp)

(rexp rate)

(rexp rate seed)

Returns a new Column of draws from an exponential distribution.

Returns a new Column of draws from an exponential distribution.

raw docstring

rint^clj

(rint expr)

Params: (e: Column)

Result: Column

Returns the double value that is closest in value to the argument and is equal to a mathematical integer.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.536Z

Params: (e: Column)

Result: Column

Returns the double value that is closest in value to the argument and
is equal to a mathematical integer.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.536Z

raw docstring

rlike^clj

(rlike expr literal)

Params: (literal: String)

Result: Column

SQL RLIKE expression (LIKE with Regex). Returns a boolean column based on a regex match.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.977Z

Params: (literal: String)

Result: Column

SQL RLIKE expression (LIKE with Regex). Returns a boolean column based on a regex
match.


1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.977Z

raw docstring

rnorm^clj

(rnorm)

(rnorm mu sigma)

(rnorm mu sigma seed)

Returns a new Column of draws from a normal distribution.

Returns a new Column of draws from a normal distribution.

raw docstring

rollup^clj

(rollup dataframe & exprs)

Params: (cols: Column*)

Result: RelationalGroupedDataset

Create a multi-dimensional rollup for the current Dataset using the specified columns, so we can run aggregation on them. See RelationalGroupedDataset for all the available aggregate functions.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.907Z

Params: (cols: Column*)

Result: RelationalGroupedDataset

Create a multi-dimensional rollup for the current Dataset using the specified columns,
so we can run aggregation on them.
See RelationalGroupedDataset for all the available aggregate functions.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.907Z

raw docstring

round^clj

(round expr)

Params: (e: Column)

Result: Column

Returns the value of the column e rounded to 0 decimal places with HALF_UP round mode.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.539Z

Params: (e: Column)

Result: Column

Returns the value of the column e rounded to 0 decimal places with HALF_UP round mode.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.539Z

raw docstring

row^clj

(row & values)

Params: (values: Seq[Any])

Result: Row

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Row$.html

Timestamp: 2020-10-19T01:56:24.277Z

Params: (values: Seq[Any])

Result: Row



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Row$.html

Timestamp: 2020-10-19T01:56:24.277Z

raw docstring

row-number^clj

(row-number)

Params: ()

Result: Column

Window function: returns a sequential number starting at 1 within a window partition.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.540Z

Params: ()

Result: Column

Window function: returns a sequential number starting at 1 within a window partition.


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.540Z

raw docstring

rpad^clj

(rpad expr length pad)

Params: (str: Column, len: Int, pad: String)

Result: Column

Right-pad the string column with pad to a length of len. If the string column is longer than len, the return value is shortened to len characters.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.541Z

Params: (str: Column, len: Int, pad: String)

Result: Column

Right-pad the string column with pad to a length of len. If the string column is longer
than len, the return value is shortened to len characters.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.541Z

raw docstring

rtrim^clj

(rtrim expr)

Params: (e: Column)

Result: Column

Trim the spaces from right end for the specified string value.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.543Z

Params: (e: Column)

Result: Column

Trim the spaces from right end for the specified string value.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.543Z

raw docstring

runif^clj

(runif)

(runif low high)

(runif low high seed)

Returns a new Column of draws from a uniform distribution.

Returns a new Column of draws from a uniform distribution.

raw docstring

runiform^clj

(runiform)

(runiform low high)

(runiform low high seed)

Returns a new Column of draws from a uniform distribution.

Returns a new Column of draws from a uniform distribution.

raw docstring

sample^clj

(sample dataframe fraction)

(sample dataframe fraction with-replacement)

Params: (fraction: Double, seed: Long)

Result: Dataset[T]

Returns a new Dataset by sampling a fraction of rows (without replacement), using a user-supplied seed.

Fraction of rows to generate, range [0.0, 1.0].

Seed for sampling.

2.3.0

This is NOT guaranteed to provide exactly the fraction of the count of the given Dataset.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.913Z

Params: (fraction: Double, seed: Long)

Result: Dataset[T]

Returns a new Dataset by sampling a fraction of rows (without replacement),
using a user-supplied seed.


Fraction of rows to generate, range [0.0, 1.0].

Seed for sampling.

2.3.0

This is NOT guaranteed to provide exactly the fraction of the count
of the given Dataset.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.913Z

raw docstring

sample-by^clj

(sample-by dataframe expr fractions seed)

Params: (col: String, fractions: Map[T, Double], seed: Long)

Result: DataFrame

Returns a stratified sample without replacement based on the fraction given on each stratum.

stratum type

column that defines strata

sampling fraction for each stratum. If a stratum is not specified, we treat its fraction as zero.

random seed

a new DataFrame that represents the stratified sample

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameStatFunctions.html

Timestamp: 2020-10-19T01:56:24.694Z

Params: (col: String, fractions: Map[T, Double], seed: Long)

Result: DataFrame

Returns a stratified sample without replacement based on the fraction given on each stratum.

stratum type

column that defines strata

sampling fraction for each stratum. If a stratum is not specified, we treat
                 its fraction as zero.

random seed

a new DataFrame that represents the stratified sample

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameStatFunctions.html

Timestamp: 2020-10-19T01:56:24.694Z

raw docstring

sc^clj

(sc)

(sc spark)

Params:

Result: SparkContext

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.550Z

Params: 

Result: SparkContext



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.550Z

raw docstring

schema-of-csv^clj

(schema-of-csv expr)

(schema-of-csv expr options)

Params: (csv: String)

Result: Column

Parses a CSV string and infers its schema in DDL format.

a CSV string.

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.547Z

Params: (csv: String)

Result: Column

Parses a CSV string and infers its schema in DDL format.


a CSV string.

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.547Z

raw docstring

schema-of-json^clj

(schema-of-json expr)

(schema-of-json expr options)

Params: (json: String)

Result: Column

Parses a JSON string and infers its schema in DDL format.

a JSON string.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.554Z

Params: (json: String)

Result: Column

Parses a JSON string and infers its schema in DDL format.


a JSON string.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.554Z

raw docstring

second^clj

(second expr)

Params: (e: Column)

Result: Column

Extracts the seconds as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a timestamp

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.555Z

Params: (e: Column)

Result: Column

Extracts the seconds as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a timestamp

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.555Z

raw docstring

select^clj

(select dataframe & exprs)

Params: (cols: Column*)

Result: DataFrame

Selects a set of column based expressions.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.931Z

Params: (cols: Column*)

Result: DataFrame

Selects a set of column based expressions.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.931Z

raw docstring

select-columns^clj

(select-columns dataframe & exprs)

Params: (cols: Column*)

Result: DataFrame

Selects a set of column based expressions.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.931Z

Params: (cols: Column*)

Result: DataFrame

Selects a set of column based expressions.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.931Z

raw docstring

select-expr^clj

(select-expr dataframe & exprs)

Params: (exprs: String*)

Result: DataFrame

Selects a set of SQL expressions. This is a variant of select that accepts SQL expressions.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.933Z

Params: (exprs: String*)

Result: DataFrame

Selects a set of SQL expressions. This is a variant of select that accepts
SQL expressions.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.933Z

raw docstring

select-keys^clj

(select-keys expr ks)

Returns a map containing only those entries in map (expr) whose key is in ks.

Returns a map containing only those entries in map (`expr`) whose key is in `ks`.

raw docstring

sequence^clj

(sequence start stop step)

Params: (start: Column, stop: Column, step: Column)

Result: Column

Generate a sequence of integers from start to stop, incrementing by step.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.557Z

Params: (start: Column, stop: Column, step: Column)

Result: Column

Generate a sequence of integers from start to stop, incrementing by step.


2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.557Z

raw docstring

sha-1^clj

(sha-1 expr)

Params: (e: Column)

Result: Column

Calculates the SHA-1 digest of a binary column and returns the value as a 40 character hex string.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.558Z

Params: (e: Column)

Result: Column

Calculates the SHA-1 digest of a binary column and returns the value
as a 40 character hex string.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.558Z

raw docstring

sha-2^clj

(sha-2 expr n-bits)

Params: (e: Column, numBits: Int)

Result: Column

Calculates the SHA-2 family of hash functions of a binary column and returns the value as a hex string.

column to compute SHA-2 on.

one of 224, 256, 384, or 512.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.559Z

Params: (e: Column, numBits: Int)

Result: Column

Calculates the SHA-2 family of hash functions of a binary column and
returns the value as a hex string.


column to compute SHA-2 on.

one of 224, 256, 384, or 512.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.559Z

raw docstring

sha1^clj

(sha1 expr)

Params: (e: Column)

Result: Column

Calculates the SHA-1 digest of a binary column and returns the value as a 40 character hex string.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.558Z

Params: (e: Column)

Result: Column

Calculates the SHA-1 digest of a binary column and returns the value
as a 40 character hex string.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.558Z

raw docstring

sha2^clj

(sha2 expr n-bits)

Params: (e: Column, numBits: Int)

Result: Column

Calculates the SHA-2 family of hash functions of a binary column and returns the value as a hex string.

column to compute SHA-2 on.

one of 224, 256, 384, or 512.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.559Z

Params: (e: Column, numBits: Int)

Result: Column

Calculates the SHA-2 family of hash functions of a binary column and
returns the value as a hex string.


column to compute SHA-2 on.

one of 224, 256, 384, or 512.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.559Z

raw docstring

shape^clj

(shape dataframe)

Returns a vector representing the dimensionality of the Dataset.

Returns a vector representing the dimensionality of the Dataset.

raw docstring

shift-left^clj

(shift-left expr num-bits)

Params: (e: Column, numBits: Int)

Result: Column

Shift the given value numBits left. If the given value is a long value, this function will return a long value else it will return an integer value.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.560Z

Params: (e: Column, numBits: Int)

Result: Column

Shift the given value numBits left. If the given value is a long value, this function
will return a long value else it will return an integer value.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.560Z

raw docstring

shift-right^clj

(shift-right expr num-bits)

Params: (e: Column, numBits: Int)

Result: Column

(Signed) shift the given value numBits right. If the given value is a long value, it will return a long value else it will return an integer value.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.562Z

Params: (e: Column, numBits: Int)

Result: Column

(Signed) shift the given value numBits right. If the given value is a long value, it will
return a long value else it will return an integer value.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.562Z

raw docstring

shift-right-unsigned^clj

(shift-right-unsigned expr num-bits)

Params: (e: Column, numBits: Int)

Result: Column

Unsigned shift the given value numBits right. If the given value is a long value, it will return a long value else it will return an integer value.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.563Z

Params: (e: Column, numBits: Int)

Result: Column

Unsigned shift the given value numBits right. If the given value is a long value,
it will return a long value else it will return an integer value.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.563Z

raw docstring

short^clj

(short expr)

Casts the column to a short.

Casts the column to a short.

raw docstring

show^clj

(show dataframe)

(show dataframe options)

Params: (numRows: Int)

Result: Unit

Displays the Dataset in a tabular form. Strings more than 20 characters will be truncated, and all cells will be aligned right. For example:

Number of rows to show

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.945Z

Params: (numRows: Int)

Result: Unit

Displays the Dataset in a tabular form. Strings more than 20 characters will be truncated,
and all cells will be aligned right. For example:

Number of rows to show

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.945Z

raw docstring

show-vertical^clj

(show-vertical dataframe)

(show-vertical dataframe options)

Displays the Dataset in a list-of-records form.

Displays the Dataset in a list-of-records form.

raw docstring

shuffle^cljmultimethod

Column: Returns a random permutation of the given array.

Dataset: Shuffles the rows of the Dataset.

Column: Returns a random permutation of the given array.

Dataset: Shuffles the rows of the Dataset.

raw docstring

signum^clj

(signum expr)

Params: (e: Column)

Result: Column

Computes the signum of the given value.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.566Z

Params: (e: Column)

Result: Column

Computes the signum of the given value.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.566Z

raw docstring

sin^clj

(sin expr)

Params: (e: Column)

Result: Column

angle in radians

sine of the angle, as if computed by java.lang.Math.sin

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.568Z

Params: (e: Column)

Result: Column

angle in radians

sine of the angle, as if computed by java.lang.Math.sin

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.568Z

raw docstring

sinh^clj

(sinh expr)

Params: (e: Column)

Result: Column

hyperbolic angle

hyperbolic sine of the given value, as if computed by java.lang.Math.sinh

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.570Z

Params: (e: Column)

Result: Column

hyperbolic angle

hyperbolic sine of the given value, as if computed by java.lang.Math.sinh

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.570Z

raw docstring

size^clj

(size expr)

Params: (e: Column)

Result: Column

Returns length of array or map.

The function returns null for null input if spark.sql.legacy.sizeOfNull is set to false or spark.sql.ansi.enabled is set to true. Otherwise, the function returns -1 for null input. With the default settings, the function returns -1 for null input.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.571Z

Params: (e: Column)

Result: Column

Returns length of array or map.

The function returns null for null input if spark.sql.legacy.sizeOfNull is set to false or
spark.sql.ansi.enabled is set to true. Otherwise, the function returns -1 for null input.
With the default settings, the function returns -1 for null input.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.571Z

raw docstring

skewness^clj

(skewness expr)

Params: (e: Column)

Result: Column

Aggregate function: returns the skewness of the values in a group.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.574Z

Params: (e: Column)

Result: Column

Aggregate function: returns the skewness of the values in a group.


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.574Z

raw docstring

slice^clj

(slice expr start length)

Params: (x: Column, start: Int, length: Int)

Result: Column

Returns an array containing all the elements in x from index start (or starting from the end if start is negative) with the specified length.

the array column to be sliced

the starting index

the length of the slice

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.575Z

Params: (x: Column, start: Int, length: Int)

Result: Column

Returns an array containing all the elements in x from index start (or starting from the
end if start is negative) with the specified length.


the array column to be sliced

the starting index

the length of the slice

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.575Z

raw docstring

sort^clj

(sort dataframe & exprs)

Params: (sortCol: String, sortCols: String*)

Result: Dataset[T]

Returns a new Dataset sorted by the given expressions. This is an alias of the sort function.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.884Z

Params: (sortCol: String, sortCols: String*)

Result: Dataset[T]

Returns a new Dataset sorted by the given expressions.
This is an alias of the sort function.


2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.884Z

raw docstring

sort-array^clj

(sort-array expr)

(sort-array expr asc)

Params: (e: Column)

Result: Column

Sorts the input array for the given column in ascending order, according to the natural ordering of the array elements. Null elements will be placed at the beginning of the returned array.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.577Z

Params: (e: Column)

Result: Column

Sorts the input array for the given column in ascending order,
according to the natural ordering of the array elements.
Null elements will be placed at the beginning of the returned array.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.577Z

raw docstring

sort-within-partitions^clj

(sort-within-partitions dataframe & exprs)

Params: (sortCol: String, sortCols: String*)

Result: Dataset[T]

Returns a new Dataset with each partition sorted by the given expressions.

This is the same operation as "SORT BY" in SQL (Hive QL).

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.950Z

Params: (sortCol: String, sortCols: String*)

Result: Dataset[T]

Returns a new Dataset with each partition sorted by the given expressions.

This is the same operation as "SORT BY" in SQL (Hive QL).


2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.950Z

raw docstring

soundex^clj

(soundex expr)

Params: (e: Column)

Result: Column

Returns the soundex code for the specified expression.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.578Z

Params: (e: Column)

Result: Column

Returns the soundex code for the specified expression.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.578Z

raw docstring

spark-conf^clj

(spark-conf spark-session)

Params:

Result: SparkConf

Return a copy of this JavaSparkContext's configuration. The configuration cannot be changed at runtime.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.511Z

Params: 

Result: SparkConf

Return a copy of this JavaSparkContext's configuration. The configuration cannot be
changed at runtime.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.511Z

raw docstring

spark-context^clj

(spark-context)

(spark-context spark)

Params:

Result: SparkContext

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.550Z

Params: 

Result: SparkContext



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.550Z

raw docstring

spark-home^clj

(spark-home)

(spark-home spark)

Params: ()

Result: Optional[String]

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.518Z

Params: ()

Result: Optional[String]

Get Spark's home location from either a value set through the constructor,
or the spark.home Java property, or the SPARK_HOME environment variable
(in that order of preference). If neither of these is set, return None.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.518Z

raw docstring

spark-partition-id^clj

(spark-partition-id)

Params: ()

Result: Column

Partition ID.

1.6.0

This is non-deterministic because it depends on data partitioning and task scheduling.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.579Z

Params: ()

Result: Column

Partition ID.


1.6.0

This is non-deterministic because it depends on data partitioning and task scheduling.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.579Z

raw docstring

spark-session^clj

(spark-session dataframe)

Params:

Result: SparkSession

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.951Z

Params: 

Result: SparkSession



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.951Z

raw docstring

sparse^clj

Params: (size: Int, indices: Array[Int], values: Array[Double])

Result: Vector

Creates a sparse vector providing its index array and value array.

vector size.

index array, must be strictly increasing.

value array, must have the same length as indices.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/ml/linalg/Vectors$.html

Timestamp: 2020-10-19T01:56:35.350Z

Params: (size: Int, indices: Array[Int], values: Array[Double])

Result: Vector

Creates a sparse vector providing its index array and value array.


vector size.

index array, must be strictly increasing.

value array, must have the same length as indices.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/ml/linalg/Vectors$.html

Timestamp: 2020-10-19T01:56:35.350Z

raw docstring

split^clj

(split expr pattern)

Params: (str: Column, pattern: String)

Result: Column

Splits str around matches of the given pattern.

a string expression to split

a string representing a regular expression. The regex string should be a Java regular expression.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.582Z

Params: (str: Column, pattern: String)

Result: Column

Splits str around matches of the given pattern.


a string expression to split

a string representing a regular expression. The regex string should be
               a Java regular expression.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.582Z

raw docstring

sql^clj

(sql spark sql-text)

Executes a SQL query using Spark, returning the result as a DataFrame.

The dialect that is used for SQL parsing can be configured with 'spark.sql.dialect'.

(g/sql spark "SELECT * FROM my_table")

Executes a SQL query using Spark, returning the result as a `DataFrame`.

The dialect that is used for SQL parsing can be configured with 'spark.sql.dialect'.

```clojure
(g/sql spark "SELECT * FROM my_table")
```

raw docstring

sql-context^clj

(sql-context dataframe)

Params:

Result: SQLContext

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.952Z

Params: 

Result: SQLContext



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.952Z

raw docstring

sqr^clj

(sqr expr)

Returns the value of the first argument raised to the power of two.

Returns the value of the first argument raised to the power of two.

raw docstring

sqrt^clj

(sqrt expr)

Params: (e: Column)

Result: Column

Computes the square root of the specified float value.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.584Z

Params: (e: Column)

Result: Column

Computes the square root of the specified float value.


1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.584Z

raw docstring

starts-with^clj

(starts-with expr literal)

Params: (other: Column)

Result: Column

String starts with. Returns a boolean column based on a string match.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.979Z

Params: (other: Column)

Result: Column

String starts with. Returns a boolean column based on a string match.


1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.979Z

raw docstring

std^clj

(std expr)

Params: (e: Column)

Result: Column

Aggregate function: alias for stddev_samp.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.586Z

Params: (e: Column)

Result: Column

Aggregate function: alias for stddev_samp.


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.586Z

raw docstring

stddev^clj

(stddev expr)

Params: (e: Column)

Result: Column

Aggregate function: alias for stddev_samp.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.586Z

Params: (e: Column)

Result: Column

Aggregate function: alias for stddev_samp.


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.586Z

raw docstring

stddev-pop^clj

(stddev-pop expr)

Params: (e: Column)

Result: Column

Aggregate function: returns the population standard deviation of the expression in a group.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.593Z

Params: (e: Column)

Result: Column

Aggregate function: returns the population standard deviation of
the expression in a group.


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.593Z

raw docstring

stddev-samp^clj

(stddev-samp expr)

Params: (e: Column)

Result: Column

Aggregate function: alias for stddev_samp.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.586Z

Params: (e: Column)

Result: Column

Aggregate function: alias for stddev_samp.


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.586Z

raw docstring

storage-level^clj

(storage-level dataframe)

Params:

Result: StorageLevel

Get the Dataset's current storage level, or StorageLevel.NONE if not persisted.

2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.954Z

Params: 

Result: StorageLevel

Get the Dataset's current storage level, or StorageLevel.NONE if not persisted.


2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.954Z

raw docstring

str^clj

(str expr)

Casts the column to a str.

Casts the column to a str.

raw docstring

streaming?^clj

(streaming? dataframe)

Params:

Result: Boolean

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.844Z

Params: 

Result: Boolean

Returns true if this Dataset contains one or more sources that continuously
return data as it arrives. A Dataset that reads data from a streaming source
must be executed as a StreamingQuery using the start() method in
DataStreamWriter. Methods that return a single answer, e.g. count() or
collect(), will throw an AnalysisException when there is a streaming
source present.


2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.844Z

raw docstring

struct^clj

(struct & exprs)

Params: (cols: Column*)

Result: Column

Creates a new struct column. If the input column is a column in a DataFrame, or a derived column expression that is named (i.e. aliased), its name would be retained as the StructField's name, otherwise, the newly generated StructField's name would be auto generated as col with a suffix index + 1, i.e. col1, col2, col3, ...

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.597Z

Params: (cols: Column*)

Result: Column

Creates a new struct column.
If the input column is a column in a DataFrame, or a derived column expression
that is named (i.e. aliased), its name would be retained as the StructField's name,
otherwise, the newly generated StructField's name would be auto generated as
col with a suffix index + 1, i.e. col1, col2, col3, ...


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.597Z

raw docstring

struct-field^clj

(struct-field col-name data-type nullable)

Creates a StructField by specifying the name col-name, data type data-type and whether values of this field can be null values nullable.

Creates a StructField by specifying the name `col-name`, data type `data-type`
and whether values of this field can be null values `nullable`.

raw docstring

struct-type^clj

(struct-type & fields)

Creates a StructType with the given list of StructFields fields.

Creates a StructType with the given list of StructFields `fields`.

raw docstring

substring^clj

(substring expr pos len)

Params: (str: Column, pos: Int, len: Int)

Result: Column

Substring starts at pos and is of length len when str is String type or returns the slice of byte array that starts at pos in byte and is of length len when str is Binary type

1.5.0

The position is not zero based, but 1 based index.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.599Z

Params: (str: Column, pos: Int, len: Int)

Result: Column

Substring starts at pos and is of length len when str is String type or
returns the slice of byte array that starts at pos in byte and is of length len
when str is Binary type


1.5.0

The position is not zero based, but 1 based index.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.599Z

raw docstring

substring-index^clj

(substring-index expr delim cnt)

Params: (str: Column, delim: String, count: Int)

Result: Column

Returns the substring from string str before count occurrences of the delimiter delim. If count is positive, everything the left of the final delimiter (counting from left) is returned. If count is negative, every to the right of the final delimiter (counting from the right) is returned. substring_index performs a case-sensitive match when searching for delim.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.600Z

Params: (str: Column, delim: String, count: Int)

Result: Column

Returns the substring from string str before count occurrences of the delimiter delim.
If count is positive, everything the left of the final delimiter (counting from left) is
returned. If count is negative, every to the right of the final delimiter (counting from the
right) is returned. substring_index performs a case-sensitive match when searching for delim.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.600Z

raw docstring

sum^cljmultimethod

Column: Aggregate function: returns the sum of all values in the given column.

RelationalGroupedDataset: Compute the sum for each numeric columns for each group.

Column: Aggregate function: returns the sum of all values in the given column.

RelationalGroupedDataset: Compute the sum for each numeric columns for each group.

raw docstring

sum-distinct^clj

(sum-distinct expr)

Params: (e: Column)

Result: Column

Aggregate function: returns the sum of distinct values in the expression.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.604Z

Params: (e: Column)

Result: Column

Aggregate function: returns the sum of distinct values in the expression.


1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.604Z

raw docstring

summary^clj

(summary dataframe & stat-names)

Params: (statistics: String*)

Result: DataFrame

Computes specified statistics for numeric and string columns. Available statistics are:

If no statistics are given, this function computes count, mean, stddev, min, approximate quartiles (percentiles at 25%, 50%, and 75%), and max.

To do a summary for specific columns first select them:

table->dataset^clj

(table->dataset table col-names)

(table->dataset spark table col-names)

Construct a Dataset from a collection of collections.

(g/show (g/table->dataset [[1 2] [3 4]] [:a :b]))
; +---+---+
; |a  |b  |
; +---+---+
; |1  |2  |
; |3  |4  |
; +---+---+

Construct a Dataset from a collection of collections.

```clojure
(g/show (g/table->dataset [[1 2] [3 4]] [:a :b]))
; +---+---+
; |a  |b  |
; +---+---+
; |1  |2  |
; |3  |4  |
; +---+---+
```

raw docstring

tail^clj

(tail dataframe n-rows)

Params: (n: Int)

Result: Array[T]

Returns the last n rows in the Dataset.

Running tail requires moving data into the application's driver process, and doing so with a very large n can crash the driver process with OutOfMemoryError.

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.959Z

Params: (n: Int)

Result: Array[T]

Returns the last n rows in the Dataset.

Running tail requires moving data into the application's driver process, and doing so with
a very large n can crash the driver process with OutOfMemoryError.


3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.959Z

raw docstring

tail-vals^clj

(tail-vals dataframe n-rows)

Returns the vector values of the last n rows in the Dataset collected.

Returns the vector values of the last n rows in the Dataset collected.

raw docstring

take^clj

(take dataframe n-rows)

Params: (n: Int)

Result: Array[T]

Returns the first n rows in the Dataset.

Running take requires moving data into the application's driver process, and doing so with a very large n can crash the driver process with OutOfMemoryError.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.961Z

Params: (n: Int)

Result: Array[T]

Returns the first n rows in the Dataset.

Running take requires moving data into the application's driver process, and doing so with
a very large n can crash the driver process with OutOfMemoryError.


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.961Z

raw docstring

take-vals^clj

(take-vals dataframe n-rows)

Returns the vector values of the first n rows in the Dataset collected.

Returns the vector values of the first n rows in the Dataset collected.

raw docstring

tan^clj

(tan expr)

Params: (e: Column)

Result: Column

angle in radians

tangent of the given value, as if computed by java.lang.Math.tan

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.607Z

Params: (e: Column)

Result: Column

angle in radians

tangent of the given value, as if computed by java.lang.Math.tan

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.607Z

raw docstring

tanh^clj

(tanh expr)

Params: (e: Column)

Result: Column

hyperbolic angle

hyperbolic tangent of the given value, as if computed by java.lang.Math.tanh

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.610Z

Params: (e: Column)

Result: Column

hyperbolic angle

hyperbolic tangent of the given value, as if computed by java.lang.Math.tanh

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.610Z

raw docstring

time-window^clj

(time-window time-expr duration)

(time-window time-expr duration slide)

(time-window time-expr duration slide start)

Params: (timeColumn: Column, windowDuration: String, slideDuration: String, startTime: String)

Result: Column

Bucketize rows into one or more time windows given a timestamp specifying column. Window starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window [12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in the order of months are not supported. The following example takes the average stock price for a one minute window every 10 seconds starting 5 seconds after the hour:

The windows will look like:

For a streaming query, you may use the function current_timestamp to generate windows on processing time.

The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType.

A string specifying the width of the window, e.g. 10 minutes, 1 second. Check org.apache.spark.unsafe.types.CalendarInterval for valid duration identifiers. Note that the duration is a fixed length of time, and does not vary over time according to a calendar. For example, 1 day always means 86,400,000 milliseconds, not a calendar day.

A string specifying the sliding interval of the window, e.g. 1 minute. A new window will be generated every slideDuration. Must be less than or equal to the windowDuration. Check org.apache.spark.unsafe.types.CalendarInterval for valid duration identifiers. This duration is likewise absolute, and does not vary according to a calendar.

The offset with respect to 1970-01-01 00:00:00 UTC with which to start window intervals. For example, in order to have hourly tumbling windows that start 15 minutes past the hour, e.g. 12:15-13:15, 13:15-14:15... provide startTime as 15 minutes.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.732Z

Params: (timeColumn: Column, windowDuration: String, slideDuration: String, startTime: String)

Result: Column

Bucketize rows into one or more time windows given a timestamp specifying column. Window
starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window
[12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in
the order of months are not supported. The following example takes the average stock price for
a one minute window every 10 seconds starting 5 seconds after the hour:

The windows will look like:

For a streaming query, you may use the function current_timestamp to generate windows on
processing time.

The column or the expression to use as the timestamp for windowing by time.
The time column must be of TimestampType.

A string specifying the width of the window, e.g. 10 minutes,
1 second. Check org.apache.spark.unsafe.types.CalendarInterval for
valid duration identifiers. Note that the duration is a fixed length of
time, and does not vary over time according to a calendar. For example,
1 day always means 86,400,000 milliseconds, not a calendar day.

A string specifying the sliding interval of the window, e.g. 1 minute.
A new window will be generated every slideDuration. Must be less than
or equal to the windowDuration. Check
org.apache.spark.unsafe.types.CalendarInterval for valid duration
identifiers. This duration is likewise absolute, and does not vary
according to a calendar.

The offset with respect to 1970-01-01 00:00:00 UTC with which to start
window intervals. For example, in order to have hourly tumbling windows that
start 15 minutes past the hour, e.g. 12:15-13:15, 13:15-14:15... provide
startTime as 15 minutes.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.732Z

raw docstring

to-byte-array^clj

(to-byte-array cms)

Params: ()

Result: Array[Byte]

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/CountMinSketch.html

Timestamp: 2020-10-19T01:56:26.107Z

Params: ()

Result: Array[Byte]



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/CountMinSketch.html

Timestamp: 2020-10-19T01:56:26.107Z

raw docstring

to-csv^clj

(to-csv expr)

(to-csv expr options)

Params: (e: Column, options: Map[String, String])

Result: Column

(Java-specific) Converts a column containing a StructType into a CSV string with the specified schema. Throws an exception, in the case of an unsupported type.

a column containing a struct.

options to control how the struct column is converted into a CSV string. It accepts the same options and the json data source.

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.613Z

Params: (e: Column, options: Map[String, String])

Result: Column

(Java-specific) Converts a column containing a StructType into a CSV string with
the specified schema. Throws an exception, in the case of an unsupported type.


a column containing a struct.

options to control how the struct column is converted into a CSV string.
               It accepts the same options and the json data source.

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.613Z

raw docstring

to-date^clj

(to-date expr)

(to-date expr date-format)

Params: (e: Column)

Result: Column

Converts the column into DateType by casting rules to DateType.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.616Z

Params: (e: Column)

Result: Column

Converts the column into DateType by casting rules to DateType.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.616Z

raw docstring

to-debug-string^clj

Coerce to string useful for debugging.

Coerce to string useful for debugging.

raw docstring

to-df^cljmultimethod

Collection: alias for table->dataset.

Dataset: Converts this strongly typed collection of data to generic DataFrame with columns renamed.

Collection: alias for `table->dataset`.

Dataset: Converts this strongly typed collection of data to generic DataFrame with columns renamed.

raw docstring

to-json^cljmultimethod

Column: Converts a column containing a StructType, ArrayType or a MapType into a JSON string with the specified schema.

Dataset: Returns the content of the Dataset as a Dataset of JSON strings.

Column: Converts a column containing a StructType, ArrayType or a MapType into a JSON string with the specified schema.

Dataset: Returns the content of the Dataset as a Dataset of JSON strings.

raw docstring

to-string^clj

Coerce to string.

Coerce to string.

raw docstring

to-timestamp^clj

(to-timestamp expr)

(to-timestamp expr date-format)

Params: (s: Column)

Result: Column

Converts to a timestamp by casting rules to TimestampType.

A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A timestamp, or null if the input was a string that could not be cast to a timestamp

2.2.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.623Z

Params: (s: Column)

Result: Column

Converts to a timestamp by casting rules to TimestampType.


A date, timestamp or string. If a string, the data must be in a format that can be
         cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A timestamp, or null if the input was a string that could not be cast to a timestamp

2.2.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.623Z

raw docstring

to-utc-timestamp^clj

(to-utc-timestamp expr)

Params: (ts: Column, tz: String)

Result: Column

Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'.

A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A timestamp, or null if ts was a string that could not be cast to a timestamp or tz was an invalid value

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.626Z

Params: (ts: Column, tz: String)

Result: Column

Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time
zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield
'2017-07-14 01:40:00.0'.


A date, timestamp or string. If a string, the data must be in a format that can be
          cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A string detailing the time zone ID that the input should be adjusted to. It should
          be in the format of either region-based zone IDs or zone offsets. Region IDs must
          have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in
          the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are
          supported as aliases of '+00:00'. Other short names are not recommended to use
          because they can be ambiguous.

A timestamp, or null if ts was a string that could not be cast to a timestamp or
        tz was an invalid value

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.626Z

raw docstring

total-count^clj

(total-count cms)

Params: ()

Result: Long

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/CountMinSketch.html

Timestamp: 2020-10-19T01:56:26.108Z

Params: ()

Result: Long



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/CountMinSketch.html

Timestamp: 2020-10-19T01:56:26.108Z

raw docstring

transform^clj

(transform expr xform-fn)

Params: (column: Column, f: (Column) ⇒ Column)

Result: Column

Returns an array of elements after applying a transformation to each element in the input array.

the input array column

col => transformed_col, the lambda function to transform the input column

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.629Z

Params: (column: Column, f: (Column) ⇒ Column)

Result: Column

Returns an array of elements after applying a transformation to each element
in the input array.

the input array column

col => transformed_col, the lambda function to transform the input column

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.629Z

raw docstring

transform-keys^clj

(transform-keys expr key-fn)

Params: (expr: Column, f: (Column, Column) ⇒ Column)

Result: Column

Applies a function to every key-value pair in a map and returns a map with the results of those applications as the new keys for the pairs.

the input map column

(key, value) => new_key, the lambda function to transform the key of input map column

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.630Z

Params: (expr: Column, f: (Column, Column) ⇒ Column)

Result: Column

Applies a function to every key-value pair in a map and returns
a map with the results of those applications as the new keys for the pairs.

the input map column

(key, value) => new_key, the lambda function to transform the key of input map column

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.630Z

raw docstring

transform-values^clj

(transform-values expr key-fn)

Params: (expr: Column, f: (Column, Column) ⇒ Column)

Result: Column

Applies a function to every key-value pair in a map and returns a map with the results of those applications as the new values for the pairs.

the input map column

(key, value) => new_value, the lambda function to transform the value of input map column

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.638Z

Params: (expr: Column, f: (Column, Column) ⇒ Column)

Result: Column

Applies a function to every key-value pair in a map and returns
a map with the results of those applications as the new values for the pairs.

the input map column

(key, value) => new_value, the lambda function to transform the value of input map
         column

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.638Z

raw docstring

translate^clj

(translate expr match replacement)

Params: (src: Column, matchingString: String, replaceString: String)

Result: Column

Translate any character in the src by a character in replaceString. The characters in replaceString correspond to the characters in matchingString. The translate will happen when any character in the string matches the character in the matchingString.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.639Z

Params: (src: Column, matchingString: String, replaceString: String)

Result: Column

Translate any character in the src by a character in replaceString.
The characters in replaceString correspond to the characters in matchingString.
The translate will happen when any character in the string matches the character
in the matchingString.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.639Z

raw docstring

trim^clj

(trim expr trim-string)

Params: (e: Column)

Result: Column

Trim the spaces from both ends for the specified string column.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.641Z

Params: (e: Column)

Result: Column

Trim the spaces from both ends for the specified string column.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.641Z

raw docstring

unbase-64^clj

(unbase-64 expr)

Params: (e: Column)

Result: Column

Decodes a BASE64 encoded string column and returns it as a binary column. This is the reverse of base64.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.702Z

Params: (e: Column)

Result: Column

Decodes a BASE64 encoded string column and returns it as a binary column.
This is the reverse of base64.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.702Z

raw docstring

unbase64^clj

(unbase64 expr)

Params: (e: Column)

Result: Column

Decodes a BASE64 encoded string column and returns it as a binary column. This is the reverse of base64.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.702Z

Params: (e: Column)

Result: Column

Decodes a BASE64 encoded string column and returns it as a binary column.
This is the reverse of base64.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.702Z

raw docstring

unbounded-following^clj

Params:

Result: Long

Value representing the last row in the partition, equivalent to "UNBOUNDED FOLLOWING" in SQL. This can be used to specify the frame boundaries:

2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/expressions/Window$.html

Timestamp: 2020-10-19T01:56:25.054Z

Params: 

Result: Long

Value representing the last row in the partition, equivalent to "UNBOUNDED FOLLOWING" in SQL.
This can be used to specify the frame boundaries:

2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/expressions/Window$.html

Timestamp: 2020-10-19T01:56:25.054Z

raw docstring

unbounded-preceding^clj

Params:

Result: Long

Value representing the first row in the partition, equivalent to "UNBOUNDED PRECEDING" in SQL. This can be used to specify the frame boundaries:

2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/expressions/Window$.html

Timestamp: 2020-10-19T01:56:25.055Z

Params: 

Result: Long

Value representing the first row in the partition, equivalent to "UNBOUNDED PRECEDING" in SQL.
This can be used to specify the frame boundaries:

2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/expressions/Window$.html

Timestamp: 2020-10-19T01:56:25.055Z

raw docstring

unhex^clj

(unhex expr)

Params: (column: Column)

Result: Column

Inverse of hex. Interprets each pair of characters as a hexadecimal number and converts to the byte representation of number.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.703Z

Params: (column: Column)

Result: Column

Inverse of hex. Interprets each pair of characters as a hexadecimal number
and converts to the byte representation of number.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.703Z

raw docstring

union^clj

(union & dataframes)

Params: (other: Dataset[T])

Result: Dataset[T]

Returns a new Dataset containing union of rows in this Dataset and another Dataset.

This is equivalent to UNION ALL in SQL. To do a SQL-style set union (that does deduplication of elements), use this function followed by a distinct.

Also as standard in SQL, this function resolves columns by position (not by name):

Notice that the column positions in the schema aren't necessarily matched with the fields in the strongly typed objects in a Dataset. This function resolves columns by their positions in the schema, not the fields in the strongly typed objects. Use unionByName to resolve columns by field name in the typed objects.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.974Z

Params: (other: Dataset[T])

Result: Dataset[T]

Returns a new Dataset containing union of rows in this Dataset and another Dataset.

This is equivalent to UNION ALL in SQL. To do a SQL-style set union (that does
deduplication of elements), use this function followed by a distinct.

Also as standard in SQL, this function resolves columns by position (not by name):

Notice that the column positions in the schema aren't necessarily matched with the
fields in the strongly typed objects in a Dataset. This function resolves columns
by their positions in the schema, not the fields in the strongly typed objects. Use
unionByName to resolve columns by field name in the typed objects.


2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.974Z

raw docstring

union-by-name^clj

(union-by-name & dataframes)

Params: (other: Dataset[T])

Result: Dataset[T]

Returns a new Dataset containing union of rows in this Dataset and another Dataset.

This is different from both UNION ALL and UNION DISTINCT in SQL. To do a SQL-style set union (that does deduplication of elements), use this function followed by a distinct.

The difference between this function and union is that this function resolves columns by name (not by position):

2.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.978Z

Params: (other: Dataset[T])

Result: Dataset[T]

Returns a new Dataset containing union of rows in this Dataset and another Dataset.

This is different from both UNION ALL and UNION DISTINCT in SQL. To do a SQL-style set
union (that does deduplication of elements), use this function followed by a distinct.

The difference between this function and union is that this function
resolves columns by name (not by position):

2.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.978Z

raw docstring

unix-timestamp^clj

(unix-timestamp)

(unix-timestamp expr)

(unix-timestamp expr pattern)

Params: ()

Result: Column

Returns the current Unix timestamp (in seconds) as a long.

1.5.0

All calls of unix_timestamp within the same query return the same value (i.e. the current timestamp is calculated at the start of query evaluation).

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.710Z

Params: ()

Result: Column

Returns the current Unix timestamp (in seconds) as a long.


1.5.0

All calls of unix_timestamp within the same query return the same value
(i.e. the current timestamp is calculated at the start of query evaluation).

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.710Z

raw docstring

unpersist^clj

(unpersist dataframe)

(unpersist dataframe blocking)

Params: (blocking: Boolean)

Result: Dataset.this.type

Mark the Dataset as non-persistent, and remove all blocks for it from memory and disk. This will not un-persist any cached data that is built upon this Dataset.

Whether to block until all blocks are deleted.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.980Z

Params: (blocking: Boolean)

Result: Dataset.this.type

Mark the Dataset as non-persistent, and remove all blocks for it from memory and disk.
This will not un-persist any cached data that is built upon this Dataset.


Whether to block until all blocks are deleted.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.980Z

raw docstring

update^cljmultimethod

Column: transform-values with Clojure's assoc signature.

Dataset: with-column with Clojure's assoc signature.

Column: `transform-values` with Clojure's `assoc` signature.

Dataset: `with-column` with Clojure's `assoc` signature.

raw docstring

upper^clj

(upper expr)

Params: (e: Column)

Result: Column

Converts a string column to upper case.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.712Z

Params: (e: Column)

Result: Column

Converts a string column to upper case.


1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.712Z

raw docstring

vals^clj

(vals expr)

Params: (e: Column)

Result: Column

Returns an unordered array containing the values of the map.

2.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.473Z

Params: (e: Column)

Result: Column

Returns an unordered array containing the values of the map.

2.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.473Z

raw docstring

value-counts^clj

(value-counts dataframe)

Returns a Dataset containing counts of unique rows.

The resulting object will be in descending order so that the first element is the most frequently-occurring element.

Returns a Dataset containing counts of unique rows.

The resulting object will be in descending order so that the
first element is the most frequently-occurring element.

raw docstring

var-pop^clj

(var-pop expr)

Params: (e: Column)

Result: Column

Aggregate function: returns the population variance of the values in a group.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.714Z

Params: (e: Column)

Result: Column

Aggregate function: returns the population variance of the values in a group.


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.714Z

raw docstring

var-samp^clj

(var-samp expr)

Params: (e: Column)

Result: Column

Aggregate function: alias for var_samp.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.718Z

Params: (e: Column)

Result: Column

Aggregate function: alias for var_samp.


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.718Z

raw docstring

variance^clj

(variance expr)

Params: (e: Column)

Result: Column

Aggregate function: alias for var_samp.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.718Z

Params: (e: Column)

Result: Column

Aggregate function: alias for var_samp.


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.718Z

raw docstring

version^clj

(version)

(version spark)

Params:

Result: String

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.576Z

Params: 

Result: String



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.576Z

raw docstring

week-of-year^clj

(week-of-year expr)

Params: (e: Column)

Result: Column

Extracts the week number as an integer from a given date/timestamp/string.

A week is considered to start on a Monday and week 1 is the first week with more than 3 days, as defined by ISO 8601

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.723Z

Params: (e: Column)

Result: Column

Extracts the week number as an integer from a given date/timestamp/string.

A week is considered to start on a Monday and week 1 is the first week with more than 3 days,
as defined by ISO 8601


An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.723Z

raw docstring

weekofyear^clj

(weekofyear expr)

Params: (e: Column)

Result: Column

Extracts the week number as an integer from a given date/timestamp/string.

A week is considered to start on a Monday and week 1 is the first week with more than 3 days, as defined by ISO 8601

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.723Z

Params: (e: Column)

Result: Column

Extracts the week number as an integer from a given date/timestamp/string.

A week is considered to start on a Monday and week 1 is the first week with more than 3 days,
as defined by ISO 8601


An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.723Z

raw docstring

when^clj

(when condition if-expr)

(when condition if-expr else-expr)

Params: (condition: Column, value: Any)

Result: Column

Evaluates a list of conditions and returns one of multiple possible result expressions. If otherwise is not defined at the end, null is returned for unmatched conditions.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.724Z

Params: (condition: Column, value: Any)

Result: Column

Evaluates a list of conditions and returns one of multiple possible result expressions.
If otherwise is not defined at the end, null is returned for unmatched conditions.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.724Z

raw docstring

where^cljmultimethod

Column: Returns an array of elements for which a predicate holds in a given array.

Dataset: Filters rows using the given condition.

Column: Returns an array of elements for which a predicate holds in a given array.

Dataset: Filters rows using the given condition.

raw docstring

width^clj

(width cms)

Params: ()

Result: Int

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/CountMinSketch.html

Timestamp: 2020-10-19T01:56:26.108Z

Params: ()

Result: Int



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/CountMinSketch.html

Timestamp: 2020-10-19T01:56:26.108Z

raw docstring

window^clj

(window {:keys [partition-by order-by range-between rows-between]})

Utility functions for defining window in DataFrames.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/expressions/Window$.html

Timestamp: 2020-10-19T01:55:47.755Z

Utility functions for defining window in DataFrames.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/expressions/Window$.html

Timestamp: 2020-10-19T01:55:47.755Z

raw docstring

windowed^clj

(windowed options)

Shortcut to create WindowSpec that takes a map as the argument.

Expected keys: [:partition-by :order-by :range-between :rows-between]

Shortcut to create WindowSpec that takes a map as the argument.

Expected keys:  [:partition-by :order-by :range-between :rows-between]

raw docstring

with-column^clj

(with-column dataframe col-name expr)

Params: (colName: String, col: Column)

Result: DataFrame

Returns a new Dataset by adding a column or replacing the existing column that has the same name.

column's expression must only refer to attributes supplied by this Dataset. It is an error to add a column that refers to some other Dataset.

2.0.0

this method introduces a projection internally. Therefore, calling it multiple times, for instance, via loops in order to add multiple columns can generate big plans which can cause performance issues and even StackOverflowException. To avoid this, use select with the multiple columns at once.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.987Z

Params: (colName: String, col: Column)

Result: DataFrame

Returns a new Dataset by adding a column or replacing the existing column that has
the same name.

column's expression must only refer to attributes supplied by this Dataset. It is an
error to add a column that refers to some other Dataset.


2.0.0

this method introduces a projection internally. Therefore, calling it multiple times,
for instance, via loops in order to add multiple columns can generate big plans which
can cause performance issues and even StackOverflowException. To avoid this,
use select with the multiple columns at once.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.987Z

raw docstring

with-column-renamed^clj

(with-column-renamed dataframe old-name new-name)

Params: (existingName: String, newName: String)

Result: DataFrame

Returns a new Dataset with a column renamed. This is a no-op if schema doesn't contain existingName.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.988Z

Params: (existingName: String, newName: String)

Result: DataFrame

Returns a new Dataset with a column renamed.
This is a no-op if schema doesn't contain existingName.


2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html

Timestamp: 2020-10-19T01:56:20.988Z

raw docstring

write-avro!^clj

(write-avro! dataframe path)

(write-avro! dataframe path options)

Writes an Avro file at the specified path.

Spark's DataFrameWriter options may be passed in as a map of options.

See: https://spark.apache.org/docs/latest/sql-data-sources.html

Writes an Avro file at the specified path.

Spark's DataFrameWriter options may be passed in as a map of options.

See: https://spark.apache.org/docs/latest/sql-data-sources.html

raw docstring

write-csv!^clj

(write-csv! dataframe path)

(write-csv! dataframe path options)

Writes a CSV file at the specified path.

Spark's DataFrameWriter options may be passed in as a map of options.

See: https://spark.apache.org/docs/latest/sql-data-sources.html

Writes a CSV file at the specified path.

Spark's DataFrameWriter options may be passed in as a map of options.

See: https://spark.apache.org/docs/latest/sql-data-sources.html

raw docstring

write-edn!^clj

(write-edn! dataframe path)

(write-edn! dataframe path options)

Writes an EDN file at the specified path.

Writes an EDN file at the specified path.

raw docstring

write-jdbc!^clj

(write-jdbc! dataframe options)

Writes a database table.

Spark's DataFrameWriter options may be passed in as a map of options.

See: https://spark.apache.org/docs/latest/sql-data-sources.html

Writes a database table.

Spark's DataFrameWriter options may be passed in as a map of options.

See: https://spark.apache.org/docs/latest/sql-data-sources.html

raw docstring

write-json!^clj

(write-json! dataframe path)

(write-json! dataframe path options)

Writes a JSON file at the specified path.

Spark's DataFrameWriter options may be passed in as a map of options.

See: https://spark.apache.org/docs/latest/sql-data-sources-json.html

Writes a JSON file at the specified path.

Spark's DataFrameWriter options may be passed in as a map of options.

See: https://spark.apache.org/docs/latest/sql-data-sources-json.html

raw docstring

write-libsvm!^clj

(write-libsvm! dataframe path)

(write-libsvm! dataframe path options)

Writes a LIBSVM file at the specified path.

Spark's DataFrameWriter options may be passed in as a map of options.

See: https://spark.apache.org/docs/latest/sql-data-sources.html

Writes a LIBSVM file at the specified path.

Spark's DataFrameWriter options may be passed in as a map of options.

See: https://spark.apache.org/docs/latest/sql-data-sources.html

raw docstring

write-parquet!^clj

(write-parquet! dataframe path)

(write-parquet! dataframe path options)

Writes a Parquet file at the specified path.

Spark's DataFrameWriter options may be passed in as a map of options.

See: https://spark.apache.org/docs/latest/sql-data-sources-parquet.html

Writes a Parquet file at the specified path.

Spark's DataFrameWriter options may be passed in as a map of options.

See: https://spark.apache.org/docs/latest/sql-data-sources-parquet.html

raw docstring

write-table!^clj

(write-table! dataframe table-name)

(write-table! dataframe table-name options)

Writes the dataset to a managed (hive) table.

Writes the dataset to a managed (hive) table.

raw docstring

write-text!^clj

(write-text! dataframe path)

(write-text! dataframe path options)

Writes a text file at the specified path.

Spark's DataFrameWriter options may be passed in as a map of options.

See: https://spark.apache.org/docs/latest/sql-data-sources.html

Writes a text file at the specified path.

Spark's DataFrameWriter options may be passed in as a map of options.

See: https://spark.apache.org/docs/latest/sql-data-sources.html

raw docstring

write-xlsx!^clj

(write-xlsx! dataframe path)

(write-xlsx! dataframe path options)

Writes an Excel file at the specified path.

Writes an Excel file at the specified path.

raw docstring

xxhash-64^clj

(xxhash-64 & exprs)

Params: (cols: Column*)

Result: Column

Calculates the hash code of given columns using the 64-bit variant of the xxHash algorithm, and returns the result as a long column.

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.733Z

Params: (cols: Column*)

Result: Column

Calculates the hash code of given columns using the 64-bit
variant of the xxHash algorithm, and returns the result as a long
column.


3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.733Z

raw docstring

xxhash64^clj

(xxhash64 & exprs)

Params: (cols: Column*)

Result: Column

Calculates the hash code of given columns using the 64-bit variant of the xxHash algorithm, and returns the result as a long column.

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.733Z

Params: (cols: Column*)

Result: Column

Calculates the hash code of given columns using the 64-bit
variant of the xxHash algorithm, and returns the result as a long
column.


3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.733Z

raw docstring

year^clj

(year expr)

Params: (e: Column)

Result: Column

Extracts the year as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.734Z

Params: (e: Column)

Result: Column

Extracts the year as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.734Z

raw docstring

zero?^clj

(zero? expr)

Returns true if expr is zero, else false.

Returns true if `expr` is zero, else false.

raw docstring

zip-with^clj

(zip-with left right merge-fn)

Params: (left: Column, right: Column, f: (Column, Column) ⇒ Column)

Result: Column

Merge two given arrays, element-wise, into a single array using a function. If one array is shorter, nulls are appended at the end to match the length of the longer array, before applying the function.

the left input array column

the right input array column

(lCol, rCol) => col, the lambda function to merge two input columns into one column

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.737Z

Params: (left: Column, right: Column, f: (Column, Column) ⇒ Column)

Result: Column

Merge two given arrays, element-wise, into a single array using a function.
If one array is shorter, nulls are appended at the end to match the length of the longer
array, before applying the function.

the left input array column

the right input array column

(lCol, rCol) => col, the lambda function to merge two input columns into one column

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.737Z

raw docstring

zipmap^clj

(zipmap key-expr val-expr)

Params: (keys: Column, values: Column)

Result: Column

Creates a new map column. The array in the first column is used for keys. The array in the second column is used for values. All elements in the array for key should not be null.

2.4

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.470Z

Params: (keys: Column, values: Column)

Result: Column

Creates a new map column. The array in the first column is used for keys. The array in the
second column is used for values. All elements in the array for key should not be null.


2.4

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.470Z

raw docstring

|^clj

(| left-expr right-expr)

Params: (other: Any)

Result: Column

Compute bitwise OR of this expression with another expression.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.879Z

Params: (other: Any)

Result: Column

Compute bitwise OR of this expression with another expression.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.879Z

raw docstring

||^clj

(|| & exprs)

Params: (other: Any)

Result: Column

Boolean OR.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.994Z

Params: (other: Any)

Result: Column

Boolean OR.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html

Timestamp: 2020-10-19T01:56:19.994Z

raw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

Keyboard shortcuts Report a problem cljdoc on GitHub

× close