Liking cljdoc? Tell your friends :D
Clojure only.

zero-one.geni.core.functions


!clj

(! expr)

Params: (e: Column)

Result: Column

Inversion of boolean expression, i.e. NOT.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.497Z

Params: (e: Column)

Result: Column

Inversion of boolean expression, i.e. NOT.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.497Z
sourceraw docstring

**clj

(** base exponent)

Params: (l: Column, r: Column)

Result: Column

Returns the value of the first argument raised to the power of the second argument.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.520Z

Params: (l: Column, r: Column)

Result: Column

Returns the value of the first argument raised to the power of the second argument.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.520Z
sourceraw docstring

->date-colclj

(->date-col expr)
(->date-col expr date-format)

Params: (e: Column)

Result: Column

Converts the column into DateType by casting rules to DateType.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.616Z

Params: (e: Column)

Result: Column

Converts the column into DateType by casting rules to DateType.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.616Z
sourceraw docstring

->timestamp-colclj

(->timestamp-col expr)
(->timestamp-col expr date-format)

Params: (s: Column)

Result: Column

Converts to a timestamp by casting rules to TimestampType.

A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A timestamp, or null if the input was a string that could not be cast to a timestamp

2.2.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.623Z

Params: (s: Column)

Result: Column

Converts to a timestamp by casting rules to TimestampType.


A date, timestamp or string. If a string, the data must be in a format that can be
         cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A timestamp, or null if the input was a string that could not be cast to a timestamp

2.2.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.623Z
sourceraw docstring

->utc-timestampclj

(->utc-timestamp expr)

Params: (ts: Column, tz: String)

Result: Column

Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'.

A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A string detailing the time zone ID that the input should be adjusted to. It should be in the format of either region-based zone IDs or zone offsets. Region IDs must have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'. Other short names are not recommended to use because they can be ambiguous.

A timestamp, or null if ts was a string that could not be cast to a timestamp or tz was an invalid value

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.626Z

Params: (ts: Column, tz: String)

Result: Column

Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time
zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield
'2017-07-14 01:40:00.0'.


A date, timestamp or string. If a string, the data must be in a format that can be
          cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A string detailing the time zone ID that the input should be adjusted to. It should
          be in the format of either region-based zone IDs or zone offsets. Region IDs must
          have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in
          the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are
          supported as aliases of '+00:00'. Other short names are not recommended to use
          because they can be ambiguous.

A timestamp, or null if ts was a string that could not be cast to a timestamp or
        tz was an invalid value

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.626Z
sourceraw docstring

absclj

(abs expr)

Params: (e: Column)

Result: Column

Computes the absolute value of a numeric value.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.169Z

Params: (e: Column)

Result: Column

Computes the absolute value of a numeric value.


1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.169Z
sourceraw docstring

acosclj

(acos expr)

Params: (e: Column)

Result: Column

inverse cosine of e in radians, as if computed by java.lang.Math.acos

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.171Z

Params: (e: Column)

Result: Column

inverse cosine of e in radians, as if computed by java.lang.Math.acos

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.171Z
sourceraw docstring

add-monthsclj

(add-months expr months)

Params: (startDate: Column, numMonths: Int)

Result: Column

Returns the date that is numMonths after startDate.

A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

The number of months to add to startDate, can be negative to subtract months

A date, or null if startDate was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.174Z

Params: (startDate: Column, numMonths: Int)

Result: Column

Returns the date that is numMonths after startDate.


A date, timestamp or string. If a string, the data must be in a format that
                 can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

The number of months to add to startDate, can be negative to subtract months

A date, or null if startDate was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.174Z
sourceraw docstring

aggregateclj

(aggregate expr init merge-fn)
(aggregate expr init merge-fn finish-fn)

Params: (expr: Column, initialValue: Column, merge: (Column, Column) ⇒ Column, finish: (Column) ⇒ Column)

Result: Column

Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state. The final state is converted into the final result by applying a finish function.

the input array column

the initial value

(combined_value, input_value) => combined_value, the merge function to merge an input value to the combined_value

combined_value => final_value, the lambda function to convert the combined value of all inputs to final result

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.177Z

Params: (expr: Column, initialValue: Column, merge: (Column, Column) ⇒ Column, finish: (Column) ⇒ Column)

Result: Column

Applies a binary operator to an initial state and all elements in the array,
and reduces this to a single state. The final state is converted into the final result
by applying a finish function.

the input array column

the initial value

(combined_value, input_value) => combined_value, the merge function to merge
             an input value to the combined_value

combined_value => final_value, the lambda function to convert the combined value
              of all inputs to final result

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.177Z
sourceraw docstring

approx-count-distinctclj

(approx-count-distinct expr)
(approx-count-distinct expr rsd)

Params: (e: Column)

Result: Column

(Since version 2.1.0) Use approx_count_distinct

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.742Z

Params: (e: Column)

Result: Column

(Since version 2.1.0) Use approx_count_distinct

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.742Z
sourceraw docstring

arrayclj

(array & exprs)

Params: (cols: Column*)

Result: Column

Creates a new array column. The input columns must all have the same data type.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.184Z

Params: (cols: Column*)

Result: Column

Creates a new array column. The input columns must all have the same data type.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.184Z
sourceraw docstring

array-containsclj

(array-contains expr value)

Params: (column: Column, value: Any)

Result: Column

Returns null if the array is null, true if the array contains value, and false otherwise.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.185Z

Params: (column: Column, value: Any)

Result: Column

Returns null if the array is null, true if the array contains value, and false otherwise.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.185Z
sourceraw docstring

array-distinctclj

(array-distinct expr)

Params: (e: Column)

Result: Column

Removes duplicate values from the array.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.186Z

Params: (e: Column)

Result: Column

Removes duplicate values from the array.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.186Z
sourceraw docstring

array-exceptclj

(array-except left right)

Params: (col1: Column, col2: Column)

Result: Column

Returns an array of the elements in the first array but not in the second array, without duplicates. The order of elements in the result is not determined

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.188Z

Params: (col1: Column, col2: Column)

Result: Column

Returns an array of the elements in the first array but not in the second array,
without duplicates. The order of elements in the result is not determined


2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.188Z
sourceraw docstring

array-intersectclj

(array-intersect left right)

Params: (col1: Column, col2: Column)

Result: Column

Returns an array of the elements in the intersection of the given two arrays, without duplicates.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.189Z

Params: (col1: Column, col2: Column)

Result: Column

Returns an array of the elements in the intersection of the given two arrays,
without duplicates.


2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.189Z
sourceraw docstring

array-joinclj

(array-join expr delimiter)
(array-join expr delimiter null-replacement)

Params: (column: Column, delimiter: String, nullReplacement: String)

Result: Column

Concatenates the elements of column using the delimiter. Null values are replaced with nullReplacement.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.194Z

Params: (column: Column, delimiter: String, nullReplacement: String)

Result: Column

Concatenates the elements of column using the delimiter. Null values are replaced with
nullReplacement.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.194Z
sourceraw docstring

array-maxclj

(array-max expr)

Params: (e: Column)

Result: Column

Returns the maximum value in the array.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.195Z

Params: (e: Column)

Result: Column

Returns the maximum value in the array.


2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.195Z
sourceraw docstring

array-minclj

(array-min expr)

Params: (e: Column)

Result: Column

Returns the minimum value in the array.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.197Z

Params: (e: Column)

Result: Column

Returns the minimum value in the array.


2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.197Z
sourceraw docstring

array-positionclj

(array-position expr value)

Params: (column: Column, value: Any)

Result: Column

Locates the position of the first occurrence of the value in the given array as long. Returns null if either of the arguments are null.

2.4.0

The position is not zero based, but 1 based index. Returns 0 if value could not be found in array.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.198Z

Params: (column: Column, value: Any)

Result: Column

Locates the position of the first occurrence of the value in the given array as long.
Returns null if either of the arguments are null.


2.4.0

The position is not zero based, but 1 based index. Returns 0 if value
could not be found in array.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.198Z
sourceraw docstring

array-removeclj

(array-remove expr element)

Params: (column: Column, element: Any)

Result: Column

Remove all elements that equal to element from the given array.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.199Z

Params: (column: Column, element: Any)

Result: Column

Remove all elements that equal to element from the given array.


2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.199Z
sourceraw docstring

array-repeatclj

(array-repeat left right)

Params: (left: Column, right: Column)

Result: Column

Creates an array containing the left argument repeated the number of times given by the right argument.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.201Z

Params: (left: Column, right: Column)

Result: Column

Creates an array containing the left argument repeated the number of times given by the
right argument.


2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.201Z
sourceraw docstring

array-sortclj

(array-sort expr)

Params: (e: Column)

Result: Column

Sorts the input array in ascending order. The elements of the input array must be orderable. Null elements will be placed at the end of the returned array.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.202Z

Params: (e: Column)

Result: Column

Sorts the input array in ascending order. The elements of the input array must be orderable.
Null elements will be placed at the end of the returned array.


2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.202Z
sourceraw docstring

array-unionclj

(array-union left right)

Params: (col1: Column, col2: Column)

Result: Column

Returns an array of the elements in the union of the given two arrays, without duplicates.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.204Z

Params: (col1: Column, col2: Column)

Result: Column

Returns an array of the elements in the union of the given two arrays, without duplicates.


2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.204Z
sourceraw docstring

arrays-overlapclj

(arrays-overlap left right)

Params: (a1: Column, a2: Column)

Result: Column

Returns true if a1 and a2 have at least one non-null element in common. If not and both the arrays are non-empty and any of them contains a null, it returns null. It returns false otherwise.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.209Z

Params: (a1: Column, a2: Column)

Result: Column

Returns true if a1 and a2 have at least one non-null element in common. If not and both
the arrays are non-empty and any of them contains a null, it returns null. It returns
false otherwise.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.209Z
sourceraw docstring

arrays-zipclj

(arrays-zip & exprs)

Params: (e: Column*)

Result: Column

Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.211Z

Params: (e: Column*)

Result: Column

Returns a merged array of structs in which the N-th struct contains all N-th values of input
arrays.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.211Z
sourceraw docstring

asciiclj

(ascii expr)

Params: (e: Column)

Result: Column

Computes the numeric value of the first character of the string column, and returns the result as an int column.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.216Z

Params: (e: Column)

Result: Column

Computes the numeric value of the first character of the string column, and returns the
result as an int column.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.216Z
sourceraw docstring

asinclj

(asin expr)

Params: (e: Column)

Result: Column

inverse sine of e in radians, as if computed by java.lang.Math.asin

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.219Z

Params: (e: Column)

Result: Column

inverse sine of e in radians, as if computed by java.lang.Math.asin

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.219Z
sourceraw docstring

atanclj

(atan expr)

Params: (e: Column)

Result: Column

inverse tangent of e, as if computed by java.lang.Math.atan

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.221Z

Params: (e: Column)

Result: Column

inverse tangent of e, as if computed by java.lang.Math.atan

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.221Z
sourceraw docstring

atan-2clj

(atan-2 expr-x expr-y)

Params: (y: Column, x: Column)

Result: Column

coordinate on y-axis

coordinate on x-axis

the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by java.lang.Math.atan2

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.233Z

Params: (y: Column, x: Column)

Result: Column

coordinate on y-axis

coordinate on x-axis

the theta component of the point
        (r, theta)
        in polar coordinates that corresponds to the point
        (x, y) in Cartesian coordinates,
        as if computed by java.lang.Math.atan2

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.233Z
sourceraw docstring

atan2clj

(atan2 expr-x expr-y)

Params: (y: Column, x: Column)

Result: Column

coordinate on y-axis

coordinate on x-axis

the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by java.lang.Math.atan2

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.233Z

Params: (y: Column, x: Column)

Result: Column

coordinate on y-axis

coordinate on x-axis

the theta component of the point
        (r, theta)
        in polar coordinates that corresponds to the point
        (x, y) in Cartesian coordinates,
        as if computed by java.lang.Math.atan2

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.233Z
sourceraw docstring

base-64clj

(base-64 expr)

Params: (e: Column)

Result: Column

Computes the BASE64 encoding of a binary column and returns it as a string column. This is the reverse of unbase64.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.236Z

Params: (e: Column)

Result: Column

Computes the BASE64 encoding of a binary column and returns it as a string column.
This is the reverse of unbase64.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.236Z
sourceraw docstring

base64clj

(base64 expr)

Params: (e: Column)

Result: Column

Computes the BASE64 encoding of a binary column and returns it as a string column. This is the reverse of unbase64.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.236Z

Params: (e: Column)

Result: Column

Computes the BASE64 encoding of a binary column and returns it as a string column.
This is the reverse of unbase64.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.236Z
sourceraw docstring

binclj

(bin expr)

Params: (e: Column)

Result: Column

An expression that returns the string representation of the binary value of the given long column. For example, bin("12") returns "1100".

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.238Z

Params: (e: Column)

Result: Column

An expression that returns the string representation of the binary value of the given long
column. For example, bin("12") returns "1100".


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.238Z
sourceraw docstring

bitwise-notclj

(bitwise-not expr)

Params: (e: Column)

Result: Column

Computes bitwise NOT (~) of a number.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.239Z

Params: (e: Column)

Result: Column

Computes bitwise NOT (~) of a number.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.239Z
sourceraw docstring

broadcastclj

(broadcast dataframe)

Params: (df: Dataset[T])

Result: Dataset[T]

Marks a DataFrame as small enough for use in broadcast joins.

The following example marks the right DataFrame for broadcast hash join using joinKey.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.240Z

Params: (df: Dataset[T])

Result: Dataset[T]

Marks a DataFrame as small enough for use in broadcast joins.

The following example marks the right DataFrame for broadcast hash join using joinKey.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.240Z
sourceraw docstring

broundclj

(bround expr)

Params: (e: Column)

Result: Column

Returns the value of the column e rounded to 0 decimal places with HALF_EVEN round mode.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.243Z

Params: (e: Column)

Result: Column

Returns the value of the column e rounded to 0 decimal places with HALF_EVEN round mode.


2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.243Z
sourceraw docstring

cbrtclj

(cbrt expr)

Params: (e: Column)

Result: Column

Computes the cube-root of the given value.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.253Z

Params: (e: Column)

Result: Column

Computes the cube-root of the given value.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.253Z
sourceraw docstring

ceilclj

(ceil expr)

Params: (e: Column)

Result: Column

Computes the ceiling of the given value.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.255Z

Params: (e: Column)

Result: Column

Computes the ceiling of the given value.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.255Z
sourceraw docstring

collect-listclj

(collect-list expr)

Params: (e: Column)

Result: Column

Aggregate function: returns a list of objects with duplicates.

1.6.0

The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.261Z

Params: (e: Column)

Result: Column

Aggregate function: returns a list of objects with duplicates.


1.6.0

The function is non-deterministic because the order of collected results depends
on the order of the rows which may be non-deterministic after a shuffle.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.261Z
sourceraw docstring

collect-setclj

(collect-set expr)

Params: (e: Column)

Result: Column

Aggregate function: returns a set of objects with duplicate elements eliminated.

1.6.0

The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.263Z

Params: (e: Column)

Result: Column

Aggregate function: returns a set of objects with duplicate elements eliminated.


1.6.0

The function is non-deterministic because the order of collected results depends
on the order of the rows which may be non-deterministic after a shuffle.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.263Z
sourceraw docstring

concatclj

(concat & exprs)

Params: (exprs: Column*)

Result: Column

Concatenates multiple input columns together into a single column. The function works with strings, binary and compatible array columns.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.265Z

Params: (exprs: Column*)

Result: Column

Concatenates multiple input columns together into a single column.
The function works with strings, binary and compatible array columns.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.265Z
sourceraw docstring

concat-wsclj

(concat-ws sep & exprs)

Params: (sep: String, exprs: Column*)

Result: Column

Concatenates multiple input string columns together into a single string column, using the given separator.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.267Z

Params: (sep: String, exprs: Column*)

Result: Column

Concatenates multiple input string columns together into a single string column,
using the given separator.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.267Z
sourceraw docstring

convclj

(conv expr from-base to-base)

Params: (num: Column, fromBase: Int, toBase: Int)

Result: Column

Convert a number in a string column from one base to another.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.268Z

Params: (num: Column, fromBase: Int, toBase: Int)

Result: Column

Convert a number in a string column from one base to another.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.268Z
sourceraw docstring

cosclj

(cos expr)

Params: (e: Column)

Result: Column

angle in radians

cosine of the angle, as if computed by java.lang.Math.cos

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.272Z

Params: (e: Column)

Result: Column

angle in radians

cosine of the angle, as if computed by java.lang.Math.cos

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.272Z
sourceraw docstring

coshclj

(cosh expr)

Params: (e: Column)

Result: Column

hyperbolic angle

hyperbolic cosine of the angle, as if computed by java.lang.Math.cosh

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.275Z

Params: (e: Column)

Result: Column

hyperbolic angle

hyperbolic cosine of the angle, as if computed by java.lang.Math.cosh

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.275Z
sourceraw docstring

count-distinctclj

(count-distinct & exprs)

Params: (expr: Column, exprs: Column*)

Result: Column

Aggregate function: returns the number of distinct items in a group.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.279Z

Params: (expr: Column, exprs: Column*)

Result: Column

Aggregate function: returns the number of distinct items in a group.


1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.279Z
sourceraw docstring

covarclj

(covar l-expr r-expr)

Params: (column1: Column, column2: Column)

Result: Column

Aggregate function: returns the sample covariance for two columns.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.284Z

Params: (column1: Column, column2: Column)

Result: Column

Aggregate function: returns the sample covariance for two columns.


2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.284Z
sourceraw docstring

covar-popclj

(covar-pop l-expr r-expr)

Params: (column1: Column, column2: Column)

Result: Column

Aggregate function: returns the population covariance for two columns.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.282Z

Params: (column1: Column, column2: Column)

Result: Column

Aggregate function: returns the population covariance for two columns.


2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.282Z
sourceraw docstring

covar-sampclj

(covar-samp l-expr r-expr)

Params: (column1: Column, column2: Column)

Result: Column

Aggregate function: returns the sample covariance for two columns.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.284Z

Params: (column1: Column, column2: Column)

Result: Column

Aggregate function: returns the sample covariance for two columns.


2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.284Z
sourceraw docstring

crc-32clj

(crc-32 expr)

Params: (e: Column)

Result: Column

Calculates the cyclic redundancy check value (CRC32) of a binary column and returns the value as a bigint.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.285Z

Params: (e: Column)

Result: Column

Calculates the cyclic redundancy check value  (CRC32) of a binary column and
returns the value as a bigint.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.285Z
sourceraw docstring

crc32clj

(crc32 expr)

Params: (e: Column)

Result: Column

Calculates the cyclic redundancy check value (CRC32) of a binary column and returns the value as a bigint.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.285Z

Params: (e: Column)

Result: Column

Calculates the cyclic redundancy check value  (CRC32) of a binary column and
returns the value as a bigint.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.285Z
sourceraw docstring

cube-rootclj

(cube-root expr)

Params: (e: Column)

Result: Column

Computes the cube-root of the given value.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.253Z

Params: (e: Column)

Result: Column

Computes the cube-root of the given value.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.253Z
sourceraw docstring

cume-distclj

(cume-dist)

Params: ()

Result: Column

Window function: returns the cumulative distribution of values within a window partition, i.e. the fraction of rows that are below the current row.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.286Z

Params: ()

Result: Column

Window function: returns the cumulative distribution of values within a window partition,
i.e. the fraction of rows that are below the current row.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.286Z
sourceraw docstring

current-dateclj

(current-date)

Params: ()

Result: Column

Returns the current date as a date column.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.287Z

Params: ()

Result: Column

Returns the current date as a date column.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.287Z
sourceraw docstring

current-timestampclj

(current-timestamp)

Params: ()

Result: Column

Returns the current timestamp as a timestamp column.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.288Z

Params: ()

Result: Column

Returns the current timestamp as a timestamp column.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.288Z
sourceraw docstring

date-addclj

(date-add expr days)

Params: (start: Column, days: Int)

Result: Column

Returns the date that is days days after start

A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

The number of days to add to start, can be negative to subtract days

A date, or null if start was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.295Z

Params: (start: Column, days: Int)

Result: Column

Returns the date that is days days after start


A date, timestamp or string. If a string, the data must be in a format that
             can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

The number of days to add to start, can be negative to subtract days

A date, or null if start was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.295Z
sourceraw docstring

date-diffclj

(date-diff l-expr r-expr)

Params: (end: Column, start: Column)

Result: Column

Returns the number of days from start to end.

Only considers the date part of the input. For example:

A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

An integer, or null if either end or start were strings that could not be cast to a date. Negative if end is before start

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.304Z

Params: (end: Column, start: Column)

Result: Column

Returns the number of days from start to end.

Only considers the date part of the input. For example:

A date, timestamp or string. If a string, the data must be in a format that
           can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A date, timestamp or string. If a string, the data must be in a format that
             can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

An integer, or null if either end or start were strings that could not be cast to
        a date. Negative if end is before start

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.304Z
sourceraw docstring

date-formatclj

(date-format expr date-fmt)

Params: (dateExpr: Column, format: String)

Result: Column

Converts a date/timestamp/string to a value of string in the format specified by the date format given by the second argument.

See Datetime Patterns for valid date and time format patterns

A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A pattern dd.MM.yyyy would return a string like 18.03.1993

A string, or null if dateExpr was a string that could not be cast to a timestamp

1.5.0

IllegalArgumentException if the format pattern is invalid

Use specialized functions like year whenever possible as they benefit from a specialized implementation.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.297Z

Params: (dateExpr: Column, format: String)

Result: Column

Converts a date/timestamp/string to a value of string in the format specified by the date
format given by the second argument.

See 
  Datetime Patterns
for valid date and time format patterns


A date, timestamp or string. If a string, the data must be in a format that
                can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A pattern dd.MM.yyyy would return a string like 18.03.1993

A string, or null if dateExpr was a string that could not be cast to a timestamp

1.5.0

IllegalArgumentException if the format pattern is invalid

Use specialized functions like year whenever possible as they benefit from a
specialized implementation.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.297Z
sourceraw docstring

date-subclj

(date-sub expr days)

Params: (start: Column, days: Int)

Result: Column

Returns the date that is days days before start

A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

The number of days to subtract from start, can be negative to add days

A date, or null if start was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.300Z

Params: (start: Column, days: Int)

Result: Column

Returns the date that is days days before start


A date, timestamp or string. If a string, the data must be in a format that
             can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

The number of days to subtract from start, can be negative to add days

A date, or null if start was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.300Z
sourceraw docstring

date-truncclj

(date-trunc fmt expr)

Params: (format: String, timestamp: Column)

Result: Column

Returns timestamp truncated to the unit specified by the format.

For example, date_trunc("year", "2018-11-19 12:01:19") returns 2018-01-01 00:00:00

A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A timestamp, or null if timestamp was a string that could not be cast to a timestamp or format was an invalid value

2.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.302Z

Params: (format: String, timestamp: Column)

Result: Column

Returns timestamp truncated to the unit specified by the format.

For example, date_trunc("year", "2018-11-19 12:01:19") returns 2018-01-01 00:00:00


A date, timestamp or string. If a string, the data must be in a format that
                 can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A timestamp, or null if timestamp was a string that could not be cast to a timestamp
        or format was an invalid value

2.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.302Z
sourceraw docstring

datediffclj

(datediff l-expr r-expr)

Params: (end: Column, start: Column)

Result: Column

Returns the number of days from start to end.

Only considers the date part of the input. For example:

A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

An integer, or null if either end or start were strings that could not be cast to a date. Negative if end is before start

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.304Z

Params: (end: Column, start: Column)

Result: Column

Returns the number of days from start to end.

Only considers the date part of the input. For example:

A date, timestamp or string. If a string, the data must be in a format that
           can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A date, timestamp or string. If a string, the data must be in a format that
             can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

An integer, or null if either end or start were strings that could not be cast to
        a date. Negative if end is before start

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.304Z
sourceraw docstring

day-of-monthclj

(day-of-month expr)

Params: (e: Column)

Result: Column

Extracts the day of the month as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.305Z

Params: (e: Column)

Result: Column

Extracts the day of the month as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.305Z
sourceraw docstring

day-of-weekclj

(day-of-week expr)

Params: (e: Column)

Result: Column

Extracts the day of the week as an integer from a given date/timestamp/string. Ranges from 1 for a Sunday through to 7 for a Saturday

An integer, or null if the input was a string that could not be cast to a date

2.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.306Z

Params: (e: Column)

Result: Column

Extracts the day of the week as an integer from a given date/timestamp/string.
Ranges from 1 for a Sunday through to 7 for a Saturday

An integer, or null if the input was a string that could not be cast to a date

2.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.306Z
sourceraw docstring

day-of-yearclj

(day-of-year expr)

Params: (e: Column)

Result: Column

Extracts the day of the year as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.307Z

Params: (e: Column)

Result: Column

Extracts the day of the year as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.307Z
sourceraw docstring

dayofmonthclj

(dayofmonth expr)

Params: (e: Column)

Result: Column

Extracts the day of the month as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.305Z

Params: (e: Column)

Result: Column

Extracts the day of the month as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.305Z
sourceraw docstring

dayofweekclj

(dayofweek expr)

Params: (e: Column)

Result: Column

Extracts the day of the week as an integer from a given date/timestamp/string. Ranges from 1 for a Sunday through to 7 for a Saturday

An integer, or null if the input was a string that could not be cast to a date

2.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.306Z

Params: (e: Column)

Result: Column

Extracts the day of the week as an integer from a given date/timestamp/string.
Ranges from 1 for a Sunday through to 7 for a Saturday

An integer, or null if the input was a string that could not be cast to a date

2.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.306Z
sourceraw docstring

dayofyearclj

(dayofyear expr)

Params: (e: Column)

Result: Column

Extracts the day of the year as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.307Z

Params: (e: Column)

Result: Column

Extracts the day of the year as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.307Z
sourceraw docstring

decodeclj

(decode expr charset)

Params: (value: Column, charset: String)

Result: Column

Computes the first argument into a string from a binary using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.309Z

Params: (value: Column, charset: String)

Result: Column

Computes the first argument into a string from a binary using the provided character set
(one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16').
If either argument is null, the result will also be null.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.309Z
sourceraw docstring

degreesclj

(degrees expr)

Params: (e: Column)

Result: Column

Converts an angle measured in radians to an approximately equivalent angle measured in degrees.

angle in radians

angle in degrees, as if computed by java.lang.Math.toDegrees

2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.312Z

Params: (e: Column)

Result: Column

Converts an angle measured in radians to an approximately equivalent angle measured in degrees.


angle in radians

angle in degrees, as if computed by java.lang.Math.toDegrees

2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.312Z
sourceraw docstring

dense-rankclj

(dense-rank)

Params: ()

Result: Column

Window function: returns the rank of rows within a window partition, without any gaps.

The difference between rank and dense_rank is that denseRank leaves no gaps in ranking sequence when there are ties. That is, if you were ranking a competition using dense_rank and had three people tie for second place, you would say that all three were in second place and that the next person came in third. Rank would give me sequential numbers, making the person that came in third place (after the ties) would register as coming in fifth.

This is equivalent to the DENSE_RANK function in SQL.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.313Z

Params: ()

Result: Column

Window function: returns the rank of rows within a window partition, without any gaps.

The difference between rank and dense_rank is that denseRank leaves no gaps in ranking
sequence when there are ties. That is, if you were ranking a competition using dense_rank
and had three people tie for second place, you would say that all three were in second
place and that the next person came in third. Rank would give me sequential numbers, making
the person that came in third place (after the ties) would register as coming in fifth.

This is equivalent to the DENSE_RANK function in SQL.


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.313Z
sourceraw docstring

element-atclj

(element-at expr value)

Params: (column: Column, value: Any)

Result: Column

Returns element of array at given index in value if column is array. Returns value for the given key in value if column is map.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.318Z

Params: (column: Column, value: Any)

Result: Column

Returns element of array at given index in value if column is array. Returns value for
the given key in value if column is map.


2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.318Z
sourceraw docstring

encodeclj

(encode expr charset)

Params: (value: Column, charset: String)

Result: Column

Computes the first argument into a binary from a string using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.319Z

Params: (value: Column, charset: String)

Result: Column

Computes the first argument into a binary from a string using the provided character set
(one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16').
If either argument is null, the result will also be null.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.319Z
sourceraw docstring

existsclj

(exists expr predicate)

Params: (column: Column, f: (Column) ⇒ Column)

Result: Column

Returns whether a predicate holds for one or more elements in the array.

the input array column

col => predicate, the Boolean predicate to check the input column

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.322Z

Params: (column: Column, f: (Column) ⇒ Column)

Result: Column

Returns whether a predicate holds for one or more elements in the array.

the input array column

col => predicate, the Boolean predicate to check the input column

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.322Z
sourceraw docstring

expclj

(exp expr)

Params: (e: Column)

Result: Column

Computes the exponential of the given value.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.324Z

Params: (e: Column)

Result: Column

Computes the exponential of the given value.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.324Z
sourceraw docstring

explodeclj

(explode expr)

Params: (e: Column)

Result: Column

Creates a new row for each element in the given array or map column. Uses the default column name col for elements in the array and key and value for elements in the map unless specified otherwise.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.325Z

Params: (e: Column)

Result: Column

Creates a new row for each element in the given array or map column.
Uses the default column name col for elements in the array and
key and value for elements in the map unless specified otherwise.


1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.325Z
sourceraw docstring

explode-outerclj

(explode-outer expr)

Params: (e: Column)

Result: Column

Creates a new row for each element in the given array or map column. Uses the default column name col for elements in the array and key and value for elements in the map unless specified otherwise.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.325Z

Params: (e: Column)

Result: Column

Creates a new row for each element in the given array or map column.
Uses the default column name col for elements in the array and
key and value for elements in the map unless specified otherwise.


1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.325Z
sourceraw docstring

expm-1clj

(expm-1 expr)

Params: (e: Column)

Result: Column

Computes the exponential of the given value minus one.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.329Z

Params: (e: Column)

Result: Column

Computes the exponential of the given value minus one.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.329Z
sourceraw docstring

expm1clj

(expm1 expr)

Params: (e: Column)

Result: Column

Computes the exponential of the given value minus one.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.329Z

Params: (e: Column)

Result: Column

Computes the exponential of the given value minus one.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.329Z
sourceraw docstring

exprclj

(expr s)

Params: (expr: String)

Result: Column

Parses the expression string into the column that it represents, similar to Dataset#selectExpr.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.330Z

Params: (expr: String)

Result: Column

Parses the expression string into the column that it represents, similar to
Dataset#selectExpr.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.330Z
sourceraw docstring

factorialclj

(factorial expr)

Params: (e: Column)

Result: Column

Computes the factorial of the given value.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.331Z

Params: (e: Column)

Result: Column

Computes the factorial of the given value.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.331Z
sourceraw docstring

flattenclj

(flatten expr)

Params: (e: Column)

Result: Column

Creates a single array from an array of arrays. If a structure of nested arrays is deeper than two levels, only one level of nesting is removed.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.345Z

Params: (e: Column)

Result: Column

Creates a single array from an array of arrays. If a structure of nested arrays is deeper than
two levels, only one level of nesting is removed.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.345Z
sourceraw docstring

floorclj

(floor expr)

Params: (e: Column)

Result: Column

Computes the floor of the given value.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.347Z

Params: (e: Column)

Result: Column

Computes the floor of the given value.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.347Z
sourceraw docstring

forallclj

(forall expr predicate)

Params: (column: Column, f: (Column) ⇒ Column)

Result: Column

Returns whether a predicate holds for every element in the array.

the input array column

col => predicate, the Boolean predicate to check the input column

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.349Z

Params: (column: Column, f: (Column) ⇒ Column)

Result: Column

Returns whether a predicate holds for every element in the array.

the input array column

col => predicate, the Boolean predicate to check the input column

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.349Z
sourceraw docstring

format-numberclj

(format-number expr decimal-places)

Params: (x: Column, d: Int)

Result: Column

Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places with HALF_EVEN round mode, and returns the result as a string column.

If d is 0, the result has no decimal point or fractional part. If d is less than 0, the result will be null.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.350Z

Params: (x: Column, d: Int)

Result: Column

Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places
with HALF_EVEN round mode, and returns the result as a string column.

If d is 0, the result has no decimal point or fractional part.
If d is less than 0, the result will be null.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.350Z
sourceraw docstring

format-stringclj

(format-string fmt & exprs)

Params: (format: String, arguments: Column*)

Result: Column

Formats the arguments in printf-style and returns the result as a string column.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.351Z

Params: (format: String, arguments: Column*)

Result: Column

Formats the arguments in printf-style and returns the result as a string column.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.351Z
sourceraw docstring

from-csvclj

(from-csv expr schema)
(from-csv expr schema options)

Params: (e: Column, schema: StructType, options: Map[String, String])

Result: Column

Parses a column containing a CSV string into a StructType with the specified schema. Returns null, in the case of an unparseable string.

a string column containing CSV data.

the schema to use when parsing the CSV string

options to control how the CSV is parsed. accepts the same options and the CSV data source.

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.354Z

Params: (e: Column, schema: StructType, options: Map[String, String])

Result: Column

Parses a column containing a CSV string into a StructType with the specified schema.
Returns null, in the case of an unparseable string.


a string column containing CSV data.

the schema to use when parsing the CSV string

options to control how the CSV is parsed. accepts the same options and the
               CSV data source.

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.354Z
sourceraw docstring

from-jsonclj

(from-json expr schema)
(from-json expr schema options)

Params: (e: Column, schema: StructType, options: Map[String, String])

Result: Column

(Scala-specific) Parses a column containing a JSON string into a StructType with the specified schema. Returns null, in the case of an unparseable string.

a string column containing JSON data.

the schema to use when parsing the json string

options to control how the json is parsed. Accepts the same options as the json data source.

2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.372Z

Params: (e: Column, schema: StructType, options: Map[String, String])

Result: Column

(Scala-specific) Parses a column containing a JSON string into a StructType with the
specified schema. Returns null, in the case of an unparseable string.


a string column containing JSON data.

the schema to use when parsing the json string

options to control how the json is parsed. Accepts the same options as the
               json data source.

2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.372Z
sourceraw docstring

from-unixtimeclj

(from-unixtime expr)
(from-unixtime expr fmt)

Params: (ut: Column)

Result: Column

Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the yyyy-MM-dd HH:mm:ss format.

A number of a type that is castable to a long, such as string or integer. Can be negative for timestamps before the unix epoch

A string, or null if the input was a string that could not be cast to a long

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.375Z

Params: (ut: Column)

Result: Column

Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string
representing the timestamp of that moment in the current system time zone in the
yyyy-MM-dd HH:mm:ss format.


A number of a type that is castable to a long, such as string or integer. Can be
          negative for timestamps before the unix epoch

A string, or null if the input was a string that could not be cast to a long

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.375Z
sourceraw docstring

greatestclj

(greatest & exprs)

Params: (exprs: Column*)

Result: Column

Returns the greatest value of the list of values, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.382Z

Params: (exprs: Column*)

Result: Column

Returns the greatest value of the list of values, skipping null values.
This function takes at least 2 parameters. It will return null iff all parameters are null.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.382Z
sourceraw docstring

groupingclj

(grouping expr)

Params: (e: Column)

Result: Column

Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated or not, returns 1 for aggregated or 0 for not aggregated in the result set.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.388Z

Params: (e: Column)

Result: Column

Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated
or not, returns 1 for aggregated or 0 for not aggregated in the result set.


2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.388Z
sourceraw docstring

grouping-idclj

(grouping-id & exprs)

Params: (cols: Column*)

Result: Column

Aggregate function: returns the level of grouping, equals to

2.0.0

The list of columns should match with grouping columns exactly, or empty (means all the grouping columns).

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.390Z

Params: (cols: Column*)

Result: Column

Aggregate function: returns the level of grouping, equals to

2.0.0

The list of columns should match with grouping columns exactly, or empty (means all the
grouping columns).

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.390Z
sourceraw docstring

hashclj

(hash & exprs)

Params: (cols: Column*)

Result: Column

Calculates the hash code of given columns, and returns the result as an int column.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.391Z

Params: (cols: Column*)

Result: Column

Calculates the hash code of given columns, and returns the result as an int column.


2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.391Z
sourceraw docstring

hexclj

(hex expr)

Params: (column: Column)

Result: Column

Computes hex value of the given column.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.393Z

Params: (column: Column)

Result: Column

Computes hex value of the given column.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.393Z
sourceraw docstring

hourclj

(hour expr)

Params: (e: Column)

Result: Column

Extracts the hours as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.394Z

Params: (e: Column)

Result: Column

Extracts the hours as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.394Z
sourceraw docstring

hypotclj

(hypot left-expr right-expr)

Params: (l: Column, r: Column)

Result: Column

Computes sqrt(a2 + b2) without intermediate overflow or underflow.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.406Z

Params: (l: Column, r: Column)

Result: Column

Computes sqrt(a2 + b2) without intermediate overflow or underflow.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.406Z
sourceraw docstring

initcapclj

(initcap expr)

Params: (e: Column)

Result: Column

Returns a new string column by converting the first letter of each word to uppercase. Words are delimited by whitespace.

For example, "hello world" will become "Hello World".

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.407Z

Params: (e: Column)

Result: Column

Returns a new string column by converting the first letter of each word to uppercase.
Words are delimited by whitespace.

For example, "hello world" will become "Hello World".


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.407Z
sourceraw docstring

input-file-nameclj

(input-file-name)

Params: ()

Result: Column

Creates a string column for the file name of the current Spark task.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.408Z

Params: ()

Result: Column

Creates a string column for the file name of the current Spark task.


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.408Z
sourceraw docstring

instrclj

(instr expr substr)

Params: (str: Column, substring: String)

Result: Column

Locate the position of the first occurrence of substr column in the given string. Returns null if either of the arguments are null.

1.5.0

The position is not zero based, but 1 based index. Returns 0 if substr could not be found in str.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.409Z

Params: (str: Column, substring: String)

Result: Column

Locate the position of the first occurrence of substr column in the given string.
Returns null if either of the arguments are null.


1.5.0

The position is not zero based, but 1 based index. Returns 0 if substr
could not be found in str.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.409Z
sourceraw docstring

kurtosisclj

(kurtosis expr)

Params: (e: Column)

Result: Column

Aggregate function: returns the kurtosis of the values in a group.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.416Z

Params: (e: Column)

Result: Column

Aggregate function: returns the kurtosis of the values in a group.


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.416Z
sourceraw docstring

lagclj

(lag expr offset)
(lag expr offset default)

Params: (e: Column, offset: Int)

Result: Column

Window function: returns the value that is offset rows before the current row, and null if there is less than offset rows before the current row. For example, an offset of one will return the previous row at any given point in the window partition.

This is equivalent to the LAG function in SQL.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.421Z

Params: (e: Column, offset: Int)

Result: Column

Window function: returns the value that is offset rows before the current row, and
null if there is less than offset rows before the current row. For example,
an offset of one will return the previous row at any given point in the window partition.

This is equivalent to the LAG function in SQL.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.421Z
sourceraw docstring

last-dayclj

(last-day expr)

Params: (e: Column)

Result: Column

Returns the last day of the month which the given date belongs to. For example, input "2015-07-27" returns "2015-07-31" since July 31 is the last day of the month in July 2015.

A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A date, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.431Z

Params: (e: Column)

Result: Column

Returns the last day of the month which the given date belongs to.
For example, input "2015-07-27" returns "2015-07-31" since July 31 is the last day of the
month in July 2015.


A date, timestamp or string. If a string, the data must be in a format that can be
         cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A date, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.431Z
sourceraw docstring

leadclj

(lead expr offset)
(lead expr offset default)

Params: (columnName: String, offset: Int)

Result: Column

Window function: returns the value that is offset rows after the current row, and null if there is less than offset rows after the current row. For example, an offset of one will return the next row at any given point in the window partition.

This is equivalent to the LEAD function in SQL.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.437Z

Params: (columnName: String, offset: Int)

Result: Column

Window function: returns the value that is offset rows after the current row, and
null if there is less than offset rows after the current row. For example,
an offset of one will return the next row at any given point in the window partition.

This is equivalent to the LEAD function in SQL.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.437Z
sourceraw docstring

leastclj

(least & exprs)

Params: (exprs: Column*)

Result: Column

Returns the least value of the list of values, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.439Z

Params: (exprs: Column*)

Result: Column

Returns the least value of the list of values, skipping null values.
This function takes at least 2 parameters. It will return null iff all parameters are null.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.439Z
sourceraw docstring

lengthclj

(length expr)

Params: (e: Column)

Result: Column

Computes the character length of a given string or number of bytes of a binary string. The length of character strings include the trailing spaces. The length of binary strings includes binary zeros.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.440Z

Params: (e: Column)

Result: Column

Computes the character length of a given string or number of bytes of a binary string.
The length of character strings include the trailing spaces. The length of binary strings
includes binary zeros.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.440Z
sourceraw docstring

levenshteinclj

(levenshtein left-expr right-expr)

Params: (l: Column, r: Column)

Result: Column

Computes the Levenshtein distance of the two given string columns.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.441Z

Params: (l: Column, r: Column)

Result: Column

Computes the Levenshtein distance of the two given string columns.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.441Z
sourceraw docstring

locateclj

(locate substr expr)

Params: (substr: String, str: Column)

Result: Column

Locate the position of the first occurrence of substr.

1.5.0

The position is not zero based, but 1 based index. Returns 0 if substr could not be found in str.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.445Z

Params: (substr: String, str: Column)

Result: Column

Locate the position of the first occurrence of substr.


1.5.0

The position is not zero based, but 1 based index. Returns 0 if substr
could not be found in str.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.445Z
sourceraw docstring

logclj

(log expr)

Params: (e: Column)

Result: Column

Computes the natural logarithm of the given value.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.449Z

Params: (e: Column)

Result: Column

Computes the natural logarithm of the given value.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.449Z
sourceraw docstring

log-10clj

(log-10 expr)

Params: (e: Column)

Result: Column

Computes the logarithm of the given value in base 10.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.451Z

Params: (e: Column)

Result: Column

Computes the logarithm of the given value in base 10.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.451Z
sourceraw docstring

log-1pclj

(log-1p expr)

Params: (e: Column)

Result: Column

Computes the natural logarithm of the given value plus one.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.453Z

Params: (e: Column)

Result: Column

Computes the natural logarithm of the given value plus one.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.453Z
sourceraw docstring

log-2clj

(log-2 expr)

Params: (expr: Column)

Result: Column

Computes the logarithm of the given column in base 2.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.455Z

Params: (expr: Column)

Result: Column

Computes the logarithm of the given column in base 2.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.455Z
sourceraw docstring

log10clj

(log10 expr)

Params: (e: Column)

Result: Column

Computes the logarithm of the given value in base 10.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.451Z

Params: (e: Column)

Result: Column

Computes the logarithm of the given value in base 10.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.451Z
sourceraw docstring

log1pclj

(log1p expr)

Params: (e: Column)

Result: Column

Computes the natural logarithm of the given value plus one.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.453Z

Params: (e: Column)

Result: Column

Computes the natural logarithm of the given value plus one.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.453Z
sourceraw docstring

log2clj

(log2 expr)

Params: (expr: Column)

Result: Column

Computes the logarithm of the given column in base 2.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.455Z

Params: (expr: Column)

Result: Column

Computes the logarithm of the given column in base 2.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.455Z
sourceraw docstring

lowerclj

(lower expr)

Params: (e: Column)

Result: Column

Converts a string column to lower case.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.457Z

Params: (e: Column)

Result: Column

Converts a string column to lower case.


1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.457Z
sourceraw docstring

lpadclj

(lpad expr length pad)

Params: (str: Column, len: Int, pad: String)

Result: Column

Left-pad the string column with pad to a length of len. If the string column is longer than len, the return value is shortened to len characters.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.458Z

Params: (str: Column, len: Int, pad: String)

Result: Column

Left-pad the string column with pad to a length of len. If the string column is longer
than len, the return value is shortened to len characters.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.458Z
sourceraw docstring

ltrimclj

(ltrim expr)

Params: (e: Column)

Result: Column

Trim the spaces from left end for the specified string value.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.460Z

Params: (e: Column)

Result: Column

Trim the spaces from left end for the specified string value.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.460Z
sourceraw docstring

mapclj

(map & exprs)

Params: (cols: Column*)

Result: Column

Creates a new map column. The input columns must be grouped as key-value pairs, e.g. (key1, value1, key2, value2, ...). The key columns must all have the same data type, and can't be null. The value columns must all have the same data type.

2.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.461Z

Params: (cols: Column*)

Result: Column

Creates a new map column. The input columns must be grouped as key-value pairs, e.g.
(key1, value1, key2, value2, ...). The key columns must all have the same data type, and can't
be null. The value columns must all have the same data type.


2.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.461Z
sourceraw docstring

map-concatclj

(map-concat & exprs)

Params: (cols: Column*)

Result: Column

Returns the union of all the given maps.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.462Z

Params: (cols: Column*)

Result: Column

Returns the union of all the given maps.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.462Z
sourceraw docstring

map-entriesclj

(map-entries expr)

Params: (e: Column)

Result: Column

Returns an unordered array of all entries in the given map.

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.463Z

Params: (e: Column)

Result: Column

Returns an unordered array of all entries in the given map.

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.463Z
sourceraw docstring

map-filterclj

(map-filter expr predicate)

Params: (expr: Column, f: (Column, Column) ⇒ Column)

Result: Column

Returns a map whose key-value pairs satisfy a predicate.

the input map column

(key, value) => predicate, the Boolean predicate to filter the input map column

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.465Z

Params: (expr: Column, f: (Column, Column) ⇒ Column)

Result: Column

Returns a map whose key-value pairs satisfy a predicate.

the input map column

(key, value) => predicate, the Boolean predicate to filter the input map column

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.465Z
sourceraw docstring

map-from-arraysclj

(map-from-arrays key-expr val-expr)

Params: (keys: Column, values: Column)

Result: Column

Creates a new map column. The array in the first column is used for keys. The array in the second column is used for values. All elements in the array for key should not be null.

2.4

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.470Z

Params: (keys: Column, values: Column)

Result: Column

Creates a new map column. The array in the first column is used for keys. The array in the
second column is used for values. All elements in the array for key should not be null.


2.4

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.470Z
sourceraw docstring

map-from-entriesclj

(map-from-entries expr)

Params: (e: Column)

Result: Column

Returns a map created from the given array of entries.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.471Z

Params: (e: Column)

Result: Column

Returns a map created from the given array of entries.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.471Z
sourceraw docstring

map-keysclj

(map-keys expr)

Params: (e: Column)

Result: Column

Returns an unordered array containing the keys of the map.

2.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.472Z

Params: (e: Column)

Result: Column

Returns an unordered array containing the keys of the map.

2.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.472Z
sourceraw docstring

map-valuesclj

(map-values expr)

Params: (e: Column)

Result: Column

Returns an unordered array containing the values of the map.

2.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.473Z

Params: (e: Column)

Result: Column

Returns an unordered array containing the values of the map.

2.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.473Z
sourceraw docstring

map-zip-withclj

(map-zip-with left right merge-fn)

Params: (left: Column, right: Column, f: (Column, Column, Column) ⇒ Column)

Result: Column

Merge two given maps, key-wise into a single map using a function.

the left input map column

the right input map column

(key, value1, value2) => new_value, the lambda function to merge the map values

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.474Z

Params: (left: Column, right: Column, f: (Column, Column, Column) ⇒ Column)

Result: Column

Merge two given maps, key-wise into a single map using a function.

the left input map column

the right input map column

(key, value1, value2) => new_value, the lambda function to merge the map values

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.474Z
sourceraw docstring

md-5clj

(md-5 expr)

Params: (e: Column)

Result: Column

Calculates the MD5 digest of a binary column and returns the value as a 32 character hex string.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.478Z

Params: (e: Column)

Result: Column

Calculates the MD5 digest of a binary column and returns the value
as a 32 character hex string.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.478Z
sourceraw docstring

md5clj

(md5 expr)

Params: (e: Column)

Result: Column

Calculates the MD5 digest of a binary column and returns the value as a 32 character hex string.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.478Z

Params: (e: Column)

Result: Column

Calculates the MD5 digest of a binary column and returns the value
as a 32 character hex string.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.478Z
sourceraw docstring

minuteclj

(minute expr)

Params: (e: Column)

Result: Column

Extracts the minutes as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.483Z

Params: (e: Column)

Result: Column

Extracts the minutes as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.483Z
sourceraw docstring

monotonically-increasing-idclj

(monotonically-increasing-id)

Params: ()

Result: Column

A column expression that generates monotonically increasing 64-bit integers.

The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the record number within each partition in the lower 33 bits. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records.

As an example, consider a DataFrame with two partitions, each with 3 records. This expression would return the following IDs:

(Since version 2.0.0) Use monotonically_increasing_id()

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.744Z

Params: ()

Result: Column

A column expression that generates monotonically increasing 64-bit integers.

The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive.
The current implementation puts the partition ID in the upper 31 bits, and the record number
within each partition in the lower 33 bits. The assumption is that the data frame has
less than 1 billion partitions, and each partition has less than 8 billion records.

As an example, consider a DataFrame with two partitions, each with 3 records.
This expression would return the following IDs:

(Since version 2.0.0) Use monotonically_increasing_id()

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.744Z
sourceraw docstring

monthclj

(month expr)

Params: (e: Column)

Result: Column

Extracts the month as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.486Z

Params: (e: Column)

Result: Column

Extracts the month as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.486Z
sourceraw docstring

months-betweenclj

(months-between l-expr r-expr)

Params: (end: Column, start: Column)

Result: Column

Returns number of months between dates start and end.

A whole number is returned if both inputs have the same day of month or both are the last day of their respective months. Otherwise, the difference is calculated assuming 31 days per month.

For example:

A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A date, timestamp or string. If a string, the data must be in a format that can cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A double, or null if either end or start were strings that could not be cast to a timestamp. Negative if end is before start

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.490Z

Params: (end: Column, start: Column)

Result: Column

Returns number of months between dates start and end.

A whole number is returned if both inputs have the same day of month or both are the last day
of their respective months. Otherwise, the difference is calculated assuming 31 days per month.

For example:

A date, timestamp or string. If a string, the data must be in a format that can
             be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A date, timestamp or string. If a string, the data must be in a format that can
             cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A double, or null if either end or start were strings that could not be cast to a
        timestamp. Negative if end is before start

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.490Z
sourceraw docstring

nanvlclj

(nanvl left-expr right-expr)

Params: (col1: Column, col2: Column)

Result: Column

Returns col1 if it is not NaN, or col2 if col1 is NaN.

Both inputs should be floating point columns (DoubleType or FloatType).

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.492Z

Params: (col1: Column, col2: Column)

Result: Column

Returns col1 if it is not NaN, or col2 if col1 is NaN.

Both inputs should be floating point columns (DoubleType or FloatType).


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.492Z
sourceraw docstring

negateclj

(negate expr)

Params: (e: Column)

Result: Column

Unary minus, i.e. negate the expression.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.494Z

Params: (e: Column)

Result: Column

Unary minus, i.e. negate the expression.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.494Z
sourceraw docstring

next-dayclj

(next-day expr day-of-week)

Params: (date: Column, dayOfWeek: String)

Result: Column

Returns the first date which is later than the value of the date column that is on the specified day of the week.

For example, next_day('2015-07-27', "Sunday") returns 2015-08-02 because that is the first Sunday after 2015-07-27.

A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

Case insensitive, and accepts: "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"

A date, or null if date was a string that could not be cast to a date or if dayOfWeek was an invalid value

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.495Z

Params: (date: Column, dayOfWeek: String)

Result: Column

Returns the first date which is later than the value of the date column that is on the
specified day of the week.

For example, next_day('2015-07-27', "Sunday") returns 2015-08-02 because that is the first
Sunday after 2015-07-27.


A date, timestamp or string. If a string, the data must be in a format that
                 can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

Case insensitive, and accepts: "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"

A date, or null if date was a string that could not be cast to a date or if
        dayOfWeek was an invalid value

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.495Z
sourceraw docstring

notclj

(not expr)

Params: (e: Column)

Result: Column

Inversion of boolean expression, i.e. NOT.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.497Z

Params: (e: Column)

Result: Column

Inversion of boolean expression, i.e. NOT.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.497Z
sourceraw docstring

ntileclj

(ntile n)

Params: (n: Int)

Result: Column

Window function: returns the ntile group id (from 1 to n inclusive) in an ordered window partition. For example, if n is 4, the first quarter of the rows will get value 1, the second quarter will get 2, the third quarter will get 3, and the last quarter will get 4.

This is equivalent to the NTILE function in SQL.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.500Z

Params: (n: Int)

Result: Column

Window function: returns the ntile group id (from 1 to n inclusive) in an ordered window
partition. For example, if n is 4, the first quarter of the rows will get value 1, the second
quarter will get 2, the third quarter will get 3, and the last quarter will get 4.

This is equivalent to the NTILE function in SQL.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.500Z
sourceraw docstring

overlayclj

(overlay src rep pos)
(overlay src rep pos len)

Params: (src: Column, replace: Column, pos: Column, len: Column)

Result: Column

Overlay the specified portion of src with replace, starting from byte position pos of src and proceeding for len bytes.

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.503Z

Params: (src: Column, replace: Column, pos: Column, len: Column)

Result: Column

Overlay the specified portion of src with replace,
 starting from byte position pos of src and proceeding for len bytes.


3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.503Z
sourceraw docstring

percent-rankclj

(percent-rank)

Params: ()

Result: Column

Window function: returns the relative rank (i.e. percentile) of rows within a window partition.

This is computed by:

This is equivalent to the PERCENT_RANK function in SQL.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.504Z

Params: ()

Result: Column

Window function: returns the relative rank (i.e. percentile) of rows within a window partition.

This is computed by:

This is equivalent to the PERCENT_RANK function in SQL.


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.504Z
sourceraw docstring

piclj

The double value that is closer than any other to pi, the ratio of the circumference of a circle to its diameter.

The double value that is closer than any other to pi, the ratio of the circumference of a circle to its diameter.
sourceraw docstring

pmodclj

(pmod left-expr right-expr)

Params: (dividend: Column, divisor: Column)

Result: Column

Returns the positive value of dividend mod divisor.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.505Z

Params: (dividend: Column, divisor: Column)

Result: Column

Returns the positive value of dividend mod divisor.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.505Z
sourceraw docstring

posexplodeclj

(posexplode expr)

Params: (e: Column)

Result: Column

Creates a new row for each element with position in the given array or map column. Uses the default column name pos for position, and col for elements in the array and key and value for elements in the map unless specified otherwise.

2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.506Z

Params: (e: Column)

Result: Column

Creates a new row for each element with position in the given array or map column.
Uses the default column name pos for position, and col for elements in the array
and key and value for elements in the map unless specified otherwise.


2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.506Z
sourceraw docstring

posexplode-outerclj

(posexplode-outer expr)

Params: (e: Column)

Result: Column

Creates a new row for each element with position in the given array or map column. Uses the default column name pos for position, and col for elements in the array and key and value for elements in the map unless specified otherwise.

2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.506Z

Params: (e: Column)

Result: Column

Creates a new row for each element with position in the given array or map column.
Uses the default column name pos for position, and col for elements in the array
and key and value for elements in the map unless specified otherwise.


2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.506Z
sourceraw docstring

powclj

(pow base exponent)

Params: (l: Column, r: Column)

Result: Column

Returns the value of the first argument raised to the power of the second argument.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.520Z

Params: (l: Column, r: Column)

Result: Column

Returns the value of the first argument raised to the power of the second argument.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.520Z
sourceraw docstring

quarterclj

(quarter expr)

Params: (e: Column)

Result: Column

Extracts the quarter as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.521Z

Params: (e: Column)

Result: Column

Extracts the quarter as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.521Z
sourceraw docstring

radiansclj

(radians expr)

Params: (e: Column)

Result: Column

Converts an angle measured in degrees to an approximately equivalent angle measured in radians.

angle in degrees

angle in radians, as if computed by java.lang.Math.toRadians

2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.523Z

Params: (e: Column)

Result: Column

Converts an angle measured in degrees to an approximately equivalent angle measured in radians.


angle in degrees

angle in radians, as if computed by java.lang.Math.toRadians

2.1.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.523Z
sourceraw docstring

randclj

(rand)
(rand seed)

Params: (seed: Long)

Result: Column

Generate a random column with independent and identically distributed (i.i.d.) samples uniformly distributed in [0.0, 1.0).

1.4.0

The function is non-deterministic in general case.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.526Z

Params: (seed: Long)

Result: Column

Generate a random column with independent and identically distributed (i.i.d.) samples
uniformly distributed in [0.0, 1.0).


1.4.0

The function is non-deterministic in general case.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.526Z
sourceraw docstring

randnclj

(randn)
(randn seed)

Params: (seed: Long)

Result: Column

Generate a column with independent and identically distributed (i.i.d.) samples from the standard normal distribution.

1.4.0

The function is non-deterministic in general case.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.528Z

Params: (seed: Long)

Result: Column

Generate a column with independent and identically distributed (i.i.d.) samples from
the standard normal distribution.


1.4.0

The function is non-deterministic in general case.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.528Z
sourceraw docstring

rankclj

(rank)

Params: ()

Result: Column

Window function: returns the rank of rows within a window partition.

The difference between rank and dense_rank is that dense_rank leaves no gaps in ranking sequence when there are ties. That is, if you were ranking a competition using dense_rank and had three people tie for second place, you would say that all three were in second place and that the next person came in third. Rank would give me sequential numbers, making the person that came in third place (after the ties) would register as coming in fifth.

This is equivalent to the RANK function in SQL.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.529Z

Params: ()

Result: Column

Window function: returns the rank of rows within a window partition.

The difference between rank and dense_rank is that dense_rank leaves no gaps in ranking
sequence when there are ties. That is, if you were ranking a competition using dense_rank
and had three people tie for second place, you would say that all three were in second
place and that the next person came in third. Rank would give me sequential numbers, making
the person that came in third place (after the ties) would register as coming in fifth.

This is equivalent to the RANK function in SQL.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.529Z
sourceraw docstring

regexp-extractclj

(regexp-extract expr regex idx)

Params: (e: Column, exp: String, groupIdx: Int)

Result: Column

Extract a specific group matched by a Java regex, from the specified string column. If the regex did not match, or the specified group did not match, an empty string is returned.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.530Z

Params: (e: Column, exp: String, groupIdx: Int)

Result: Column

Extract a specific group matched by a Java regex, from the specified string column.
If the regex did not match, or the specified group did not match, an empty string is returned.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.530Z
sourceraw docstring

regexp-replaceclj

(regexp-replace expr pattern-expr replacement-expr)

Params: (e: Column, pattern: String, replacement: String)

Result: Column

Replace all substrings of the specified string value that match regexp with rep.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.532Z

Params: (e: Column, pattern: String, replacement: String)

Result: Column

Replace all substrings of the specified string value that match regexp with rep.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.532Z
sourceraw docstring

reverseclj

(reverse expr)

Params: (e: Column)

Result: Column

Returns a reversed string or an array with reverse order of elements.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.534Z

Params: (e: Column)

Result: Column

Returns a reversed string or an array with reverse order of elements.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.534Z
sourceraw docstring

rintclj

(rint expr)

Params: (e: Column)

Result: Column

Returns the double value that is closest in value to the argument and is equal to a mathematical integer.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.536Z

Params: (e: Column)

Result: Column

Returns the double value that is closest in value to the argument and
is equal to a mathematical integer.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.536Z
sourceraw docstring

roundclj

(round expr)

Params: (e: Column)

Result: Column

Returns the value of the column e rounded to 0 decimal places with HALF_UP round mode.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.539Z

Params: (e: Column)

Result: Column

Returns the value of the column e rounded to 0 decimal places with HALF_UP round mode.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.539Z
sourceraw docstring

row-numberclj

(row-number)

Params: ()

Result: Column

Window function: returns a sequential number starting at 1 within a window partition.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.540Z

Params: ()

Result: Column

Window function: returns a sequential number starting at 1 within a window partition.


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.540Z
sourceraw docstring

rpadclj

(rpad expr length pad)

Params: (str: Column, len: Int, pad: String)

Result: Column

Right-pad the string column with pad to a length of len. If the string column is longer than len, the return value is shortened to len characters.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.541Z

Params: (str: Column, len: Int, pad: String)

Result: Column

Right-pad the string column with pad to a length of len. If the string column is longer
than len, the return value is shortened to len characters.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.541Z
sourceraw docstring

rtrimclj

(rtrim expr)

Params: (e: Column)

Result: Column

Trim the spaces from right end for the specified string value.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.543Z

Params: (e: Column)

Result: Column

Trim the spaces from right end for the specified string value.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.543Z
sourceraw docstring

schema-of-csvclj

(schema-of-csv expr)
(schema-of-csv expr options)

Params: (csv: String)

Result: Column

Parses a CSV string and infers its schema in DDL format.

a CSV string.

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.547Z

Params: (csv: String)

Result: Column

Parses a CSV string and infers its schema in DDL format.


a CSV string.

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.547Z
sourceraw docstring

schema-of-jsonclj

(schema-of-json expr)
(schema-of-json expr options)

Params: (json: String)

Result: Column

Parses a JSON string and infers its schema in DDL format.

a JSON string.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.554Z

Params: (json: String)

Result: Column

Parses a JSON string and infers its schema in DDL format.


a JSON string.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.554Z
sourceraw docstring

secondclj

(second expr)

Params: (e: Column)

Result: Column

Extracts the seconds as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a timestamp

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.555Z

Params: (e: Column)

Result: Column

Extracts the seconds as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a timestamp

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.555Z
sourceraw docstring

sequenceclj

(sequence start stop step)

Params: (start: Column, stop: Column, step: Column)

Result: Column

Generate a sequence of integers from start to stop, incrementing by step.

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.557Z

Params: (start: Column, stop: Column, step: Column)

Result: Column

Generate a sequence of integers from start to stop, incrementing by step.


2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.557Z
sourceraw docstring

sha-1clj

(sha-1 expr)

Params: (e: Column)

Result: Column

Calculates the SHA-1 digest of a binary column and returns the value as a 40 character hex string.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.558Z

Params: (e: Column)

Result: Column

Calculates the SHA-1 digest of a binary column and returns the value
as a 40 character hex string.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.558Z
sourceraw docstring

sha-2clj

(sha-2 expr n-bits)

Params: (e: Column, numBits: Int)

Result: Column

Calculates the SHA-2 family of hash functions of a binary column and returns the value as a hex string.

column to compute SHA-2 on.

one of 224, 256, 384, or 512.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.559Z

Params: (e: Column, numBits: Int)

Result: Column

Calculates the SHA-2 family of hash functions of a binary column and
returns the value as a hex string.


column to compute SHA-2 on.

one of 224, 256, 384, or 512.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.559Z
sourceraw docstring

sha1clj

(sha1 expr)

Params: (e: Column)

Result: Column

Calculates the SHA-1 digest of a binary column and returns the value as a 40 character hex string.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.558Z

Params: (e: Column)

Result: Column

Calculates the SHA-1 digest of a binary column and returns the value
as a 40 character hex string.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.558Z
sourceraw docstring

sha2clj

(sha2 expr n-bits)

Params: (e: Column, numBits: Int)

Result: Column

Calculates the SHA-2 family of hash functions of a binary column and returns the value as a hex string.

column to compute SHA-2 on.

one of 224, 256, 384, or 512.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.559Z

Params: (e: Column, numBits: Int)

Result: Column

Calculates the SHA-2 family of hash functions of a binary column and
returns the value as a hex string.


column to compute SHA-2 on.

one of 224, 256, 384, or 512.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.559Z
sourceraw docstring

shift-leftclj

(shift-left expr num-bits)

Params: (e: Column, numBits: Int)

Result: Column

Shift the given value numBits left. If the given value is a long value, this function will return a long value else it will return an integer value.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.560Z

Params: (e: Column, numBits: Int)

Result: Column

Shift the given value numBits left. If the given value is a long value, this function
will return a long value else it will return an integer value.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.560Z
sourceraw docstring

shift-rightclj

(shift-right expr num-bits)

Params: (e: Column, numBits: Int)

Result: Column

(Signed) shift the given value numBits right. If the given value is a long value, it will return a long value else it will return an integer value.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.562Z

Params: (e: Column, numBits: Int)

Result: Column

(Signed) shift the given value numBits right. If the given value is a long value, it will
return a long value else it will return an integer value.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.562Z
sourceraw docstring

shift-right-unsignedclj

(shift-right-unsigned expr num-bits)

Params: (e: Column, numBits: Int)

Result: Column

Unsigned shift the given value numBits right. If the given value is a long value, it will return a long value else it will return an integer value.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.563Z

Params: (e: Column, numBits: Int)

Result: Column

Unsigned shift the given value numBits right. If the given value is a long value,
it will return a long value else it will return an integer value.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.563Z
sourceraw docstring

signclj

(sign expr)

Params: (e: Column)

Result: Column

Computes the signum of the given value.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.566Z

Params: (e: Column)

Result: Column

Computes the signum of the given value.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.566Z
sourceraw docstring

signumclj

(signum expr)

Params: (e: Column)

Result: Column

Computes the signum of the given value.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.566Z

Params: (e: Column)

Result: Column

Computes the signum of the given value.


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.566Z
sourceraw docstring

sinclj

(sin expr)

Params: (e: Column)

Result: Column

angle in radians

sine of the angle, as if computed by java.lang.Math.sin

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.568Z

Params: (e: Column)

Result: Column

angle in radians

sine of the angle, as if computed by java.lang.Math.sin

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.568Z
sourceraw docstring

sinhclj

(sinh expr)

Params: (e: Column)

Result: Column

hyperbolic angle

hyperbolic sine of the given value, as if computed by java.lang.Math.sinh

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.570Z

Params: (e: Column)

Result: Column

hyperbolic angle

hyperbolic sine of the given value, as if computed by java.lang.Math.sinh

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.570Z
sourceraw docstring

sizeclj

(size expr)

Params: (e: Column)

Result: Column

Returns length of array or map.

The function returns null for null input if spark.sql.legacy.sizeOfNull is set to false or spark.sql.ansi.enabled is set to true. Otherwise, the function returns -1 for null input. With the default settings, the function returns -1 for null input.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.571Z

Params: (e: Column)

Result: Column

Returns length of array or map.

The function returns null for null input if spark.sql.legacy.sizeOfNull is set to false or
spark.sql.ansi.enabled is set to true. Otherwise, the function returns -1 for null input.
With the default settings, the function returns -1 for null input.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.571Z
sourceraw docstring

skewnessclj

(skewness expr)

Params: (e: Column)

Result: Column

Aggregate function: returns the skewness of the values in a group.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.574Z

Params: (e: Column)

Result: Column

Aggregate function: returns the skewness of the values in a group.


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.574Z
sourceraw docstring

sliceclj

(slice expr start length)

Params: (x: Column, start: Int, length: Int)

Result: Column

Returns an array containing all the elements in x from index start (or starting from the end if start is negative) with the specified length.

the array column to be sliced

the starting index

the length of the slice

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.575Z

Params: (x: Column, start: Int, length: Int)

Result: Column

Returns an array containing all the elements in x from index start (or starting from the
end if start is negative) with the specified length.


the array column to be sliced

the starting index

the length of the slice

2.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.575Z
sourceraw docstring

sort-arrayclj

(sort-array expr)
(sort-array expr asc)

Params: (e: Column)

Result: Column

Sorts the input array for the given column in ascending order, according to the natural ordering of the array elements. Null elements will be placed at the beginning of the returned array.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.577Z

Params: (e: Column)

Result: Column

Sorts the input array for the given column in ascending order,
according to the natural ordering of the array elements.
Null elements will be placed at the beginning of the returned array.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.577Z
sourceraw docstring

soundexclj

(soundex expr)

Params: (e: Column)

Result: Column

Returns the soundex code for the specified expression.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.578Z

Params: (e: Column)

Result: Column

Returns the soundex code for the specified expression.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.578Z
sourceraw docstring

spark-partition-idclj

(spark-partition-id)

Params: ()

Result: Column

Partition ID.

1.6.0

This is non-deterministic because it depends on data partitioning and task scheduling.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.579Z

Params: ()

Result: Column

Partition ID.


1.6.0

This is non-deterministic because it depends on data partitioning and task scheduling.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.579Z
sourceraw docstring

splitclj

(split expr pattern)

Params: (str: Column, pattern: String)

Result: Column

Splits str around matches of the given pattern.

a string expression to split

a string representing a regular expression. The regex string should be a Java regular expression.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.582Z

Params: (str: Column, pattern: String)

Result: Column

Splits str around matches of the given pattern.


a string expression to split

a string representing a regular expression. The regex string should be
               a Java regular expression.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.582Z
sourceraw docstring

sqrclj

(sqr expr)

Returns the value of the first argument raised to the power of two.

Returns the value of the first argument raised to the power of two.
sourceraw docstring

sqrtclj

(sqrt expr)

Params: (e: Column)

Result: Column

Computes the square root of the specified float value.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.584Z

Params: (e: Column)

Result: Column

Computes the square root of the specified float value.


1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.584Z
sourceraw docstring

stdclj

(std expr)

Params: (e: Column)

Result: Column

Aggregate function: alias for stddev_samp.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.586Z

Params: (e: Column)

Result: Column

Aggregate function: alias for stddev_samp.


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.586Z
sourceraw docstring

stddevclj

(stddev expr)

Params: (e: Column)

Result: Column

Aggregate function: alias for stddev_samp.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.586Z

Params: (e: Column)

Result: Column

Aggregate function: alias for stddev_samp.


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.586Z
sourceraw docstring

stddev-popclj

(stddev-pop expr)

Params: (e: Column)

Result: Column

Aggregate function: returns the population standard deviation of the expression in a group.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.593Z

Params: (e: Column)

Result: Column

Aggregate function: returns the population standard deviation of
the expression in a group.


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.593Z
sourceraw docstring

stddev-sampclj

(stddev-samp expr)

Params: (e: Column)

Result: Column

Aggregate function: alias for stddev_samp.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.586Z

Params: (e: Column)

Result: Column

Aggregate function: alias for stddev_samp.


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.586Z
sourceraw docstring

structclj

(struct & exprs)

Params: (cols: Column*)

Result: Column

Creates a new struct column. If the input column is a column in a DataFrame, or a derived column expression that is named (i.e. aliased), its name would be retained as the StructField's name, otherwise, the newly generated StructField's name would be auto generated as col with a suffix index + 1, i.e. col1, col2, col3, ...

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.597Z

Params: (cols: Column*)

Result: Column

Creates a new struct column.
If the input column is a column in a DataFrame, or a derived column expression
that is named (i.e. aliased), its name would be retained as the StructField's name,
otherwise, the newly generated StructField's name would be auto generated as
col with a suffix index + 1, i.e. col1, col2, col3, ...


1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.597Z
sourceraw docstring

substringclj

(substring expr pos len)

Params: (str: Column, pos: Int, len: Int)

Result: Column

Substring starts at pos and is of length len when str is String type or returns the slice of byte array that starts at pos in byte and is of length len when str is Binary type

1.5.0

The position is not zero based, but 1 based index.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.599Z

Params: (str: Column, pos: Int, len: Int)

Result: Column

Substring starts at pos and is of length len when str is String type or
returns the slice of byte array that starts at pos in byte and is of length len
when str is Binary type


1.5.0

The position is not zero based, but 1 based index.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.599Z
sourceraw docstring

substring-indexclj

(substring-index expr delim cnt)

Params: (str: Column, delim: String, count: Int)

Result: Column

Returns the substring from string str before count occurrences of the delimiter delim. If count is positive, everything the left of the final delimiter (counting from left) is returned. If count is negative, every to the right of the final delimiter (counting from the right) is returned. substring_index performs a case-sensitive match when searching for delim.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.600Z

Params: (str: Column, delim: String, count: Int)

Result: Column

Returns the substring from string str before count occurrences of the delimiter delim.
If count is positive, everything the left of the final delimiter (counting from left) is
returned. If count is negative, every to the right of the final delimiter (counting from the
right) is returned. substring_index performs a case-sensitive match when searching for delim.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.600Z
sourceraw docstring

sum-distinctclj

(sum-distinct expr)

Params: (e: Column)

Result: Column

Aggregate function: returns the sum of distinct values in the expression.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.604Z

Params: (e: Column)

Result: Column

Aggregate function: returns the sum of distinct values in the expression.


1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.604Z
sourceraw docstring

tanclj

(tan expr)

Params: (e: Column)

Result: Column

angle in radians

tangent of the given value, as if computed by java.lang.Math.tan

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.607Z

Params: (e: Column)

Result: Column

angle in radians

tangent of the given value, as if computed by java.lang.Math.tan

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.607Z
sourceraw docstring

tanhclj

(tanh expr)

Params: (e: Column)

Result: Column

hyperbolic angle

hyperbolic tangent of the given value, as if computed by java.lang.Math.tanh

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.610Z

Params: (e: Column)

Result: Column

hyperbolic angle

hyperbolic tangent of the given value, as if computed by java.lang.Math.tanh

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.610Z
sourceraw docstring

time-windowclj

(time-window time-expr duration)
(time-window time-expr duration slide)
(time-window time-expr duration slide start)

Params: (timeColumn: Column, windowDuration: String, slideDuration: String, startTime: String)

Result: Column

Bucketize rows into one or more time windows given a timestamp specifying column. Window starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window [12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in the order of months are not supported. The following example takes the average stock price for a one minute window every 10 seconds starting 5 seconds after the hour:

The windows will look like:

For a streaming query, you may use the function current_timestamp to generate windows on processing time.

The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType.

A string specifying the width of the window, e.g. 10 minutes, 1 second. Check org.apache.spark.unsafe.types.CalendarInterval for valid duration identifiers. Note that the duration is a fixed length of time, and does not vary over time according to a calendar. For example, 1 day always means 86,400,000 milliseconds, not a calendar day.

A string specifying the sliding interval of the window, e.g. 1 minute. A new window will be generated every slideDuration. Must be less than or equal to the windowDuration. Check org.apache.spark.unsafe.types.CalendarInterval for valid duration identifiers. This duration is likewise absolute, and does not vary according to a calendar.

The offset with respect to 1970-01-01 00:00:00 UTC with which to start window intervals. For example, in order to have hourly tumbling windows that start 15 minutes past the hour, e.g. 12:15-13:15, 13:15-14:15... provide startTime as 15 minutes.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.732Z

Params: (timeColumn: Column, windowDuration: String, slideDuration: String, startTime: String)

Result: Column

Bucketize rows into one or more time windows given a timestamp specifying column. Window
starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window
[12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in
the order of months are not supported. The following example takes the average stock price for
a one minute window every 10 seconds starting 5 seconds after the hour:

The windows will look like:

For a streaming query, you may use the function current_timestamp to generate windows on
processing time.


The column or the expression to use as the timestamp for windowing by time.
                  The time column must be of TimestampType.

A string specifying the width of the window, e.g. 10 minutes,
                      1 second. Check org.apache.spark.unsafe.types.CalendarInterval for
                      valid duration identifiers. Note that the duration is a fixed length of
                      time, and does not vary over time according to a calendar. For example,
                      1 day always means 86,400,000 milliseconds, not a calendar day.

A string specifying the sliding interval of the window, e.g. 1 minute.
                     A new window will be generated every slideDuration. Must be less than
                     or equal to the windowDuration. Check
                     org.apache.spark.unsafe.types.CalendarInterval for valid duration
                     identifiers. This duration is likewise absolute, and does not vary
                     according to a calendar.

The offset with respect to 1970-01-01 00:00:00 UTC with which to start
                 window intervals. For example, in order to have hourly tumbling windows that
                 start 15 minutes past the hour, e.g. 12:15-13:15, 13:15-14:15... provide
                 startTime as 15 minutes.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.732Z
sourceraw docstring

to-csvclj

(to-csv expr)
(to-csv expr options)

Params: (e: Column, options: Map[String, String])

Result: Column

(Java-specific) Converts a column containing a StructType into a CSV string with the specified schema. Throws an exception, in the case of an unsupported type.

a column containing a struct.

options to control how the struct column is converted into a CSV string. It accepts the same options and the json data source.

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.613Z

Params: (e: Column, options: Map[String, String])

Result: Column

(Java-specific) Converts a column containing a StructType into a CSV string with
the specified schema. Throws an exception, in the case of an unsupported type.


a column containing a struct.

options to control how the struct column is converted into a CSV string.
               It accepts the same options and the json data source.

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.613Z
sourceraw docstring

to-dateclj

(to-date expr)
(to-date expr date-format)

Params: (e: Column)

Result: Column

Converts the column into DateType by casting rules to DateType.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.616Z

Params: (e: Column)

Result: Column

Converts the column into DateType by casting rules to DateType.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.616Z
sourceraw docstring

to-timestampclj

(to-timestamp expr)
(to-timestamp expr date-format)

Params: (s: Column)

Result: Column

Converts to a timestamp by casting rules to TimestampType.

A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A timestamp, or null if the input was a string that could not be cast to a timestamp

2.2.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.623Z

Params: (s: Column)

Result: Column

Converts to a timestamp by casting rules to TimestampType.


A date, timestamp or string. If a string, the data must be in a format that can be
         cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A timestamp, or null if the input was a string that could not be cast to a timestamp

2.2.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.623Z
sourceraw docstring

to-utc-timestampclj

(to-utc-timestamp expr)

Params: (ts: Column, tz: String)

Result: Column

Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'.

A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A string detailing the time zone ID that the input should be adjusted to. It should be in the format of either region-based zone IDs or zone offsets. Region IDs must have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'. Other short names are not recommended to use because they can be ambiguous.

A timestamp, or null if ts was a string that could not be cast to a timestamp or tz was an invalid value

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.626Z

Params: (ts: Column, tz: String)

Result: Column

Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time
zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield
'2017-07-14 01:40:00.0'.


A date, timestamp or string. If a string, the data must be in a format that can be
          cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

A string detailing the time zone ID that the input should be adjusted to. It should
          be in the format of either region-based zone IDs or zone offsets. Region IDs must
          have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in
          the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are
          supported as aliases of '+00:00'. Other short names are not recommended to use
          because they can be ambiguous.

A timestamp, or null if ts was a string that could not be cast to a timestamp or
        tz was an invalid value

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.626Z
sourceraw docstring

transformclj

(transform expr xform-fn)

Params: (column: Column, f: (Column) ⇒ Column)

Result: Column

Returns an array of elements after applying a transformation to each element in the input array.

the input array column

col => transformed_col, the lambda function to transform the input column

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.629Z

Params: (column: Column, f: (Column) ⇒ Column)

Result: Column

Returns an array of elements after applying a transformation to each element
in the input array.

the input array column

col => transformed_col, the lambda function to transform the input column

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.629Z
sourceraw docstring

transform-keysclj

(transform-keys expr key-fn)

Params: (expr: Column, f: (Column, Column) ⇒ Column)

Result: Column

Applies a function to every key-value pair in a map and returns a map with the results of those applications as the new keys for the pairs.

the input map column

(key, value) => new_key, the lambda function to transform the key of input map column

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.630Z

Params: (expr: Column, f: (Column, Column) ⇒ Column)

Result: Column

Applies a function to every key-value pair in a map and returns
a map with the results of those applications as the new keys for the pairs.

the input map column

(key, value) => new_key, the lambda function to transform the key of input map column

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.630Z
sourceraw docstring

transform-valuesclj

(transform-values expr key-fn)

Params: (expr: Column, f: (Column, Column) ⇒ Column)

Result: Column

Applies a function to every key-value pair in a map and returns a map with the results of those applications as the new values for the pairs.

the input map column

(key, value) => new_value, the lambda function to transform the value of input map column

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.638Z

Params: (expr: Column, f: (Column, Column) ⇒ Column)

Result: Column

Applies a function to every key-value pair in a map and returns
a map with the results of those applications as the new values for the pairs.

the input map column

(key, value) => new_value, the lambda function to transform the value of input map
         column

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.638Z
sourceraw docstring

translateclj

(translate expr match replacement)

Params: (src: Column, matchingString: String, replaceString: String)

Result: Column

Translate any character in the src by a character in replaceString. The characters in replaceString correspond to the characters in matchingString. The translate will happen when any character in the string matches the character in the matchingString.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.639Z

Params: (src: Column, matchingString: String, replaceString: String)

Result: Column

Translate any character in the src by a character in replaceString.
The characters in replaceString correspond to the characters in matchingString.
The translate will happen when any character in the string matches the character
in the matchingString.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.639Z
sourceraw docstring

trimclj

(trim expr trim-string)

Params: (e: Column)

Result: Column

Trim the spaces from both ends for the specified string column.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.641Z

Params: (e: Column)

Result: Column

Trim the spaces from both ends for the specified string column.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.641Z
sourceraw docstring

unbase-64clj

(unbase-64 expr)

Params: (e: Column)

Result: Column

Decodes a BASE64 encoded string column and returns it as a binary column. This is the reverse of base64.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.702Z

Params: (e: Column)

Result: Column

Decodes a BASE64 encoded string column and returns it as a binary column.
This is the reverse of base64.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.702Z
sourceraw docstring

unbase64clj

(unbase64 expr)

Params: (e: Column)

Result: Column

Decodes a BASE64 encoded string column and returns it as a binary column. This is the reverse of base64.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.702Z

Params: (e: Column)

Result: Column

Decodes a BASE64 encoded string column and returns it as a binary column.
This is the reverse of base64.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.702Z
sourceraw docstring

unhexclj

(unhex expr)

Params: (column: Column)

Result: Column

Inverse of hex. Interprets each pair of characters as a hexadecimal number and converts to the byte representation of number.

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.703Z

Params: (column: Column)

Result: Column

Inverse of hex. Interprets each pair of characters as a hexadecimal number
and converts to the byte representation of number.


1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.703Z
sourceraw docstring

unix-timestampclj

(unix-timestamp)
(unix-timestamp expr)
(unix-timestamp expr pattern)

Params: ()

Result: Column

Returns the current Unix timestamp (in seconds) as a long.

1.5.0

All calls of unix_timestamp within the same query return the same value (i.e. the current timestamp is calculated at the start of query evaluation).

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.710Z

Params: ()

Result: Column

Returns the current Unix timestamp (in seconds) as a long.


1.5.0

All calls of unix_timestamp within the same query return the same value
(i.e. the current timestamp is calculated at the start of query evaluation).

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.710Z
sourceraw docstring

upperclj

(upper expr)

Params: (e: Column)

Result: Column

Converts a string column to upper case.

1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.712Z

Params: (e: Column)

Result: Column

Converts a string column to upper case.


1.3.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.712Z
sourceraw docstring

var-popclj

(var-pop expr)

Params: (e: Column)

Result: Column

Aggregate function: returns the population variance of the values in a group.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.714Z

Params: (e: Column)

Result: Column

Aggregate function: returns the population variance of the values in a group.


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.714Z
sourceraw docstring

var-sampclj

(var-samp expr)

Params: (e: Column)

Result: Column

Aggregate function: alias for var_samp.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.718Z

Params: (e: Column)

Result: Column

Aggregate function: alias for var_samp.


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.718Z
sourceraw docstring

varianceclj

(variance expr)

Params: (e: Column)

Result: Column

Aggregate function: alias for var_samp.

1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.718Z

Params: (e: Column)

Result: Column

Aggregate function: alias for var_samp.


1.6.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.718Z
sourceraw docstring

week-of-yearclj

(week-of-year expr)

Params: (e: Column)

Result: Column

Extracts the week number as an integer from a given date/timestamp/string.

A week is considered to start on a Monday and week 1 is the first week with more than 3 days, as defined by ISO 8601

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.723Z

Params: (e: Column)

Result: Column

Extracts the week number as an integer from a given date/timestamp/string.

A week is considered to start on a Monday and week 1 is the first week with more than 3 days,
as defined by ISO 8601


An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.723Z
sourceraw docstring

weekofyearclj

(weekofyear expr)

Params: (e: Column)

Result: Column

Extracts the week number as an integer from a given date/timestamp/string.

A week is considered to start on a Monday and week 1 is the first week with more than 3 days, as defined by ISO 8601

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.723Z

Params: (e: Column)

Result: Column

Extracts the week number as an integer from a given date/timestamp/string.

A week is considered to start on a Monday and week 1 is the first week with more than 3 days,
as defined by ISO 8601


An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.723Z
sourceraw docstring

whenclj

(when condition if-expr)
(when condition if-expr else-expr)

Params: (condition: Column, value: Any)

Result: Column

Evaluates a list of conditions and returns one of multiple possible result expressions. If otherwise is not defined at the end, null is returned for unmatched conditions.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.724Z

Params: (condition: Column, value: Any)

Result: Column

Evaluates a list of conditions and returns one of multiple possible result expressions.
If otherwise is not defined at the end, null is returned for unmatched conditions.

1.4.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.724Z
sourceraw docstring

windowclj

(window time-expr duration)
(window time-expr duration slide)
(window time-expr duration slide start)

Params: (timeColumn: Column, windowDuration: String, slideDuration: String, startTime: String)

Result: Column

Bucketize rows into one or more time windows given a timestamp specifying column. Window starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window [12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in the order of months are not supported. The following example takes the average stock price for a one minute window every 10 seconds starting 5 seconds after the hour:

The windows will look like:

For a streaming query, you may use the function current_timestamp to generate windows on processing time.

The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType.

A string specifying the width of the window, e.g. 10 minutes, 1 second. Check org.apache.spark.unsafe.types.CalendarInterval for valid duration identifiers. Note that the duration is a fixed length of time, and does not vary over time according to a calendar. For example, 1 day always means 86,400,000 milliseconds, not a calendar day.

A string specifying the sliding interval of the window, e.g. 1 minute. A new window will be generated every slideDuration. Must be less than or equal to the windowDuration. Check org.apache.spark.unsafe.types.CalendarInterval for valid duration identifiers. This duration is likewise absolute, and does not vary according to a calendar.

The offset with respect to 1970-01-01 00:00:00 UTC with which to start window intervals. For example, in order to have hourly tumbling windows that start 15 minutes past the hour, e.g. 12:15-13:15, 13:15-14:15... provide startTime as 15 minutes.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.732Z

Params: (timeColumn: Column, windowDuration: String, slideDuration: String, startTime: String)

Result: Column

Bucketize rows into one or more time windows given a timestamp specifying column. Window
starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window
[12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in
the order of months are not supported. The following example takes the average stock price for
a one minute window every 10 seconds starting 5 seconds after the hour:

The windows will look like:

For a streaming query, you may use the function current_timestamp to generate windows on
processing time.


The column or the expression to use as the timestamp for windowing by time.
                  The time column must be of TimestampType.

A string specifying the width of the window, e.g. 10 minutes,
                      1 second. Check org.apache.spark.unsafe.types.CalendarInterval for
                      valid duration identifiers. Note that the duration is a fixed length of
                      time, and does not vary over time according to a calendar. For example,
                      1 day always means 86,400,000 milliseconds, not a calendar day.

A string specifying the sliding interval of the window, e.g. 1 minute.
                     A new window will be generated every slideDuration. Must be less than
                     or equal to the windowDuration. Check
                     org.apache.spark.unsafe.types.CalendarInterval for valid duration
                     identifiers. This duration is likewise absolute, and does not vary
                     according to a calendar.

The offset with respect to 1970-01-01 00:00:00 UTC with which to start
                 window intervals. For example, in order to have hourly tumbling windows that
                 start 15 minutes past the hour, e.g. 12:15-13:15, 13:15-14:15... provide
                 startTime as 15 minutes.

2.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.732Z
sourceraw docstring

xxhash-64clj

(xxhash-64 & exprs)

Params: (cols: Column*)

Result: Column

Calculates the hash code of given columns using the 64-bit variant of the xxHash algorithm, and returns the result as a long column.

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.733Z

Params: (cols: Column*)

Result: Column

Calculates the hash code of given columns using the 64-bit
variant of the xxHash algorithm, and returns the result as a long
column.


3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.733Z
sourceraw docstring

xxhash64clj

(xxhash64 & exprs)

Params: (cols: Column*)

Result: Column

Calculates the hash code of given columns using the 64-bit variant of the xxHash algorithm, and returns the result as a long column.

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.733Z

Params: (cols: Column*)

Result: Column

Calculates the hash code of given columns using the 64-bit
variant of the xxHash algorithm, and returns the result as a long
column.


3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.733Z
sourceraw docstring

yearclj

(year expr)

Params: (e: Column)

Result: Column

Extracts the year as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.734Z

Params: (e: Column)

Result: Column

Extracts the year as an integer from a given date/timestamp/string.

An integer, or null if the input was a string that could not be cast to a date

1.5.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.734Z
sourceraw docstring

zip-withclj

(zip-with left right merge-fn)

Params: (left: Column, right: Column, f: (Column, Column) ⇒ Column)

Result: Column

Merge two given arrays, element-wise, into a single array using a function. If one array is shorter, nulls are appended at the end to match the length of the longer array, before applying the function.

the left input array column

the right input array column

(lCol, rCol) => col, the lambda function to merge two input columns into one column

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.737Z

Params: (left: Column, right: Column, f: (Column, Column) ⇒ Column)

Result: Column

Merge two given arrays, element-wise, into a single array using a function.
If one array is shorter, nulls are appended at the end to match the length of the longer
array, before applying the function.

the left input array column

the right input array column

(lCol, rCol) => col, the lambda function to merge two input columns into one column

3.0.0

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html

Timestamp: 2020-10-19T01:56:22.737Z
sourceraw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close