(! expr)
Params: (e: Column)
Result: Column
Inversion of boolean expression, i.e. NOT.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.497Z
Params: (e: Column) Result: Column Inversion of boolean expression, i.e. NOT. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.497Z
(** base exponent)
Params: (l: Column, r: Column)
Result: Column
Returns the value of the first argument raised to the power of the second argument.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.520Z
Params: (l: Column, r: Column) Result: Column Returns the value of the first argument raised to the power of the second argument. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.520Z
(->date-col expr)
(->date-col expr date-format)
Params: (e: Column)
Result: Column
Converts the column into DateType by casting rules to DateType.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.616Z
Params: (e: Column) Result: Column Converts the column into DateType by casting rules to DateType. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.616Z
(->timestamp-col expr)
(->timestamp-col expr date-format)
Params: (s: Column)
Result: Column
Converts to a timestamp by casting rules to TimestampType.
A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
A timestamp, or null if the input was a string that could not be cast to a timestamp
2.2.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.623Z
Params: (s: Column) Result: Column Converts to a timestamp by casting rules to TimestampType. A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS A timestamp, or null if the input was a string that could not be cast to a timestamp 2.2.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.623Z
(->utc-timestamp expr)
Params: (ts: Column, tz: String)
Result: Column
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'.
A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
A string detailing the time zone ID that the input should be adjusted to. It should be in the format of either region-based zone IDs or zone offsets. Region IDs must have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'. Other short names are not recommended to use because they can be ambiguous.
A timestamp, or null if ts was a string that could not be cast to a timestamp or tz was an invalid value
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.626Z
Params: (ts: Column, tz: String) Result: Column Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'. A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS A string detailing the time zone ID that the input should be adjusted to. It should be in the format of either region-based zone IDs or zone offsets. Region IDs must have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'. Other short names are not recommended to use because they can be ambiguous. A timestamp, or null if ts was a string that could not be cast to a timestamp or tz was an invalid value 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.626Z
(abs expr)
Params: (e: Column)
Result: Column
Computes the absolute value of a numeric value.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.169Z
Params: (e: Column) Result: Column Computes the absolute value of a numeric value. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.169Z
(acos expr)
Params: (e: Column)
Result: Column
inverse cosine of e in radians, as if computed by java.lang.Math.acos
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.171Z
Params: (e: Column) Result: Column inverse cosine of e in radians, as if computed by java.lang.Math.acos 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.171Z
(add-months expr months)
Params: (startDate: Column, numMonths: Int)
Result: Column
Returns the date that is numMonths after startDate.
A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
The number of months to add to startDate, can be negative to subtract months
A date, or null if startDate was a string that could not be cast to a date
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.174Z
Params: (startDate: Column, numMonths: Int) Result: Column Returns the date that is numMonths after startDate. A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS The number of months to add to startDate, can be negative to subtract months A date, or null if startDate was a string that could not be cast to a date 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.174Z
(aggregate expr init merge-fn)
(aggregate expr init merge-fn finish-fn)
Params: (expr: Column, initialValue: Column, merge: (Column, Column) ⇒ Column, finish: (Column) ⇒ Column)
Result: Column
Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state. The final state is converted into the final result by applying a finish function.
the input array column
the initial value
(combined_value, input_value) => combined_value, the merge function to merge an input value to the combined_value
combined_value => final_value, the lambda function to convert the combined value of all inputs to final result
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.177Z
Params: (expr: Column, initialValue: Column, merge: (Column, Column) ⇒ Column, finish: (Column) ⇒ Column) Result: Column Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state. The final state is converted into the final result by applying a finish function. the input array column the initial value (combined_value, input_value) => combined_value, the merge function to merge an input value to the combined_value combined_value => final_value, the lambda function to convert the combined value of all inputs to final result 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.177Z
(approx-count-distinct expr)
(approx-count-distinct expr rsd)
Params: (e: Column)
Result: Column
(Since version 2.1.0) Use approx_count_distinct
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.742Z
Params: (e: Column) Result: Column (Since version 2.1.0) Use approx_count_distinct 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.742Z
(array & exprs)
Params: (cols: Column*)
Result: Column
Creates a new array column. The input columns must all have the same data type.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.184Z
Params: (cols: Column*) Result: Column Creates a new array column. The input columns must all have the same data type. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.184Z
(array-contains expr value)
Params: (column: Column, value: Any)
Result: Column
Returns null if the array is null, true if the array contains value, and false otherwise.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.185Z
Params: (column: Column, value: Any) Result: Column Returns null if the array is null, true if the array contains value, and false otherwise. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.185Z
(array-distinct expr)
Params: (e: Column)
Result: Column
Removes duplicate values from the array.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.186Z
Params: (e: Column) Result: Column Removes duplicate values from the array. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.186Z
(array-except left right)
Params: (col1: Column, col2: Column)
Result: Column
Returns an array of the elements in the first array but not in the second array, without duplicates. The order of elements in the result is not determined
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.188Z
Params: (col1: Column, col2: Column) Result: Column Returns an array of the elements in the first array but not in the second array, without duplicates. The order of elements in the result is not determined 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.188Z
(array-intersect left right)
Params: (col1: Column, col2: Column)
Result: Column
Returns an array of the elements in the intersection of the given two arrays, without duplicates.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.189Z
Params: (col1: Column, col2: Column) Result: Column Returns an array of the elements in the intersection of the given two arrays, without duplicates. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.189Z
(array-join expr delimiter)
(array-join expr delimiter null-replacement)
Params: (column: Column, delimiter: String, nullReplacement: String)
Result: Column
Concatenates the elements of column using the delimiter. Null values are replaced with nullReplacement.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.194Z
Params: (column: Column, delimiter: String, nullReplacement: String) Result: Column Concatenates the elements of column using the delimiter. Null values are replaced with nullReplacement. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.194Z
(array-max expr)
Params: (e: Column)
Result: Column
Returns the maximum value in the array.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.195Z
Params: (e: Column) Result: Column Returns the maximum value in the array. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.195Z
(array-min expr)
Params: (e: Column)
Result: Column
Returns the minimum value in the array.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.197Z
Params: (e: Column) Result: Column Returns the minimum value in the array. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.197Z
(array-position expr value)
Params: (column: Column, value: Any)
Result: Column
Locates the position of the first occurrence of the value in the given array as long. Returns null if either of the arguments are null.
2.4.0
The position is not zero based, but 1 based index. Returns 0 if value could not be found in array.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.198Z
Params: (column: Column, value: Any) Result: Column Locates the position of the first occurrence of the value in the given array as long. Returns null if either of the arguments are null. 2.4.0 The position is not zero based, but 1 based index. Returns 0 if value could not be found in array. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.198Z
(array-remove expr element)
Params: (column: Column, element: Any)
Result: Column
Remove all elements that equal to element from the given array.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.199Z
Params: (column: Column, element: Any) Result: Column Remove all elements that equal to element from the given array. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.199Z
(array-repeat left right)
Params: (left: Column, right: Column)
Result: Column
Creates an array containing the left argument repeated the number of times given by the right argument.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.201Z
Params: (left: Column, right: Column) Result: Column Creates an array containing the left argument repeated the number of times given by the right argument. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.201Z
(array-sort expr)
Params: (e: Column)
Result: Column
Sorts the input array in ascending order. The elements of the input array must be orderable. Null elements will be placed at the end of the returned array.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.202Z
Params: (e: Column) Result: Column Sorts the input array in ascending order. The elements of the input array must be orderable. Null elements will be placed at the end of the returned array. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.202Z
(array-union left right)
Params: (col1: Column, col2: Column)
Result: Column
Returns an array of the elements in the union of the given two arrays, without duplicates.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.204Z
Params: (col1: Column, col2: Column) Result: Column Returns an array of the elements in the union of the given two arrays, without duplicates. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.204Z
(arrays-overlap left right)
Params: (a1: Column, a2: Column)
Result: Column
Returns true if a1 and a2 have at least one non-null element in common. If not and both the arrays are non-empty and any of them contains a null, it returns null. It returns false otherwise.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.209Z
Params: (a1: Column, a2: Column) Result: Column Returns true if a1 and a2 have at least one non-null element in common. If not and both the arrays are non-empty and any of them contains a null, it returns null. It returns false otherwise. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.209Z
(arrays-zip & exprs)
Params: (e: Column*)
Result: Column
Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.211Z
Params: (e: Column*) Result: Column Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.211Z
(ascii expr)
Params: (e: Column)
Result: Column
Computes the numeric value of the first character of the string column, and returns the result as an int column.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.216Z
Params: (e: Column) Result: Column Computes the numeric value of the first character of the string column, and returns the result as an int column. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.216Z
(asin expr)
Params: (e: Column)
Result: Column
inverse sine of e in radians, as if computed by java.lang.Math.asin
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.219Z
Params: (e: Column) Result: Column inverse sine of e in radians, as if computed by java.lang.Math.asin 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.219Z
(atan expr)
Params: (e: Column)
Result: Column
inverse tangent of e, as if computed by java.lang.Math.atan
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.221Z
Params: (e: Column) Result: Column inverse tangent of e, as if computed by java.lang.Math.atan 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.221Z
(atan-2 expr-x expr-y)
Params: (y: Column, x: Column)
Result: Column
coordinate on y-axis
coordinate on x-axis
the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by java.lang.Math.atan2
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.233Z
Params: (y: Column, x: Column) Result: Column coordinate on y-axis coordinate on x-axis the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by java.lang.Math.atan2 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.233Z
(atan2 expr-x expr-y)
Params: (y: Column, x: Column)
Result: Column
coordinate on y-axis
coordinate on x-axis
the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by java.lang.Math.atan2
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.233Z
Params: (y: Column, x: Column) Result: Column coordinate on y-axis coordinate on x-axis the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by java.lang.Math.atan2 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.233Z
(base-64 expr)
Params: (e: Column)
Result: Column
Computes the BASE64 encoding of a binary column and returns it as a string column. This is the reverse of unbase64.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.236Z
Params: (e: Column) Result: Column Computes the BASE64 encoding of a binary column and returns it as a string column. This is the reverse of unbase64. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.236Z
(base64 expr)
Params: (e: Column)
Result: Column
Computes the BASE64 encoding of a binary column and returns it as a string column. This is the reverse of unbase64.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.236Z
Params: (e: Column) Result: Column Computes the BASE64 encoding of a binary column and returns it as a string column. This is the reverse of unbase64. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.236Z
(bin expr)
Params: (e: Column)
Result: Column
An expression that returns the string representation of the binary value of the given long column. For example, bin("12") returns "1100".
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.238Z
Params: (e: Column) Result: Column An expression that returns the string representation of the binary value of the given long column. For example, bin("12") returns "1100". 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.238Z
(bitwise-not expr)
Params: (e: Column)
Result: Column
Computes bitwise NOT (~) of a number.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.239Z
Params: (e: Column) Result: Column Computes bitwise NOT (~) of a number. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.239Z
(broadcast dataframe)
Params: (df: Dataset[T])
Result: Dataset[T]
Marks a DataFrame as small enough for use in broadcast joins.
The following example marks the right DataFrame for broadcast hash join using joinKey.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.240Z
Params: (df: Dataset[T]) Result: Dataset[T] Marks a DataFrame as small enough for use in broadcast joins. The following example marks the right DataFrame for broadcast hash join using joinKey. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.240Z
(bround expr)
Params: (e: Column)
Result: Column
Returns the value of the column e rounded to 0 decimal places with HALF_EVEN round mode.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.243Z
Params: (e: Column) Result: Column Returns the value of the column e rounded to 0 decimal places with HALF_EVEN round mode. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.243Z
(cbrt expr)
Params: (e: Column)
Result: Column
Computes the cube-root of the given value.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.253Z
Params: (e: Column) Result: Column Computes the cube-root of the given value. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.253Z
(ceil expr)
Params: (e: Column)
Result: Column
Computes the ceiling of the given value.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.255Z
Params: (e: Column) Result: Column Computes the ceiling of the given value. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.255Z
(collect-list expr)
Params: (e: Column)
Result: Column
Aggregate function: returns a list of objects with duplicates.
1.6.0
The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.261Z
Params: (e: Column) Result: Column Aggregate function: returns a list of objects with duplicates. 1.6.0 The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.261Z
(collect-set expr)
Params: (e: Column)
Result: Column
Aggregate function: returns a set of objects with duplicate elements eliminated.
1.6.0
The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.263Z
Params: (e: Column) Result: Column Aggregate function: returns a set of objects with duplicate elements eliminated. 1.6.0 The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.263Z
(concat & exprs)
Params: (exprs: Column*)
Result: Column
Concatenates multiple input columns together into a single column. The function works with strings, binary and compatible array columns.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.265Z
Params: (exprs: Column*) Result: Column Concatenates multiple input columns together into a single column. The function works with strings, binary and compatible array columns. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.265Z
(concat-ws sep & exprs)
Params: (sep: String, exprs: Column*)
Result: Column
Concatenates multiple input string columns together into a single string column, using the given separator.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.267Z
Params: (sep: String, exprs: Column*) Result: Column Concatenates multiple input string columns together into a single string column, using the given separator. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.267Z
(conv expr from-base to-base)
Params: (num: Column, fromBase: Int, toBase: Int)
Result: Column
Convert a number in a string column from one base to another.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.268Z
Params: (num: Column, fromBase: Int, toBase: Int) Result: Column Convert a number in a string column from one base to another. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.268Z
(cos expr)
Params: (e: Column)
Result: Column
angle in radians
cosine of the angle, as if computed by java.lang.Math.cos
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.272Z
Params: (e: Column) Result: Column angle in radians cosine of the angle, as if computed by java.lang.Math.cos 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.272Z
(cosh expr)
Params: (e: Column)
Result: Column
hyperbolic angle
hyperbolic cosine of the angle, as if computed by java.lang.Math.cosh
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.275Z
Params: (e: Column) Result: Column hyperbolic angle hyperbolic cosine of the angle, as if computed by java.lang.Math.cosh 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.275Z
(count-distinct & exprs)
Params: (expr: Column, exprs: Column*)
Result: Column
Aggregate function: returns the number of distinct items in a group.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.279Z
Params: (expr: Column, exprs: Column*) Result: Column Aggregate function: returns the number of distinct items in a group. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.279Z
(covar l-expr r-expr)
Params: (column1: Column, column2: Column)
Result: Column
Aggregate function: returns the sample covariance for two columns.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.284Z
Params: (column1: Column, column2: Column) Result: Column Aggregate function: returns the sample covariance for two columns. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.284Z
(covar-pop l-expr r-expr)
Params: (column1: Column, column2: Column)
Result: Column
Aggregate function: returns the population covariance for two columns.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.282Z
Params: (column1: Column, column2: Column) Result: Column Aggregate function: returns the population covariance for two columns. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.282Z
(covar-samp l-expr r-expr)
Params: (column1: Column, column2: Column)
Result: Column
Aggregate function: returns the sample covariance for two columns.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.284Z
Params: (column1: Column, column2: Column) Result: Column Aggregate function: returns the sample covariance for two columns. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.284Z
(crc-32 expr)
Params: (e: Column)
Result: Column
Calculates the cyclic redundancy check value (CRC32) of a binary column and returns the value as a bigint.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.285Z
Params: (e: Column) Result: Column Calculates the cyclic redundancy check value (CRC32) of a binary column and returns the value as a bigint. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.285Z
(crc32 expr)
Params: (e: Column)
Result: Column
Calculates the cyclic redundancy check value (CRC32) of a binary column and returns the value as a bigint.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.285Z
Params: (e: Column) Result: Column Calculates the cyclic redundancy check value (CRC32) of a binary column and returns the value as a bigint. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.285Z
(cube-root expr)
Params: (e: Column)
Result: Column
Computes the cube-root of the given value.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.253Z
Params: (e: Column) Result: Column Computes the cube-root of the given value. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.253Z
(cume-dist)
Params: ()
Result: Column
Window function: returns the cumulative distribution of values within a window partition, i.e. the fraction of rows that are below the current row.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.286Z
Params: () Result: Column Window function: returns the cumulative distribution of values within a window partition, i.e. the fraction of rows that are below the current row. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.286Z
(current-date)
Params: ()
Result: Column
Returns the current date as a date column.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.287Z
Params: () Result: Column Returns the current date as a date column. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.287Z
(current-timestamp)
Params: ()
Result: Column
Returns the current timestamp as a timestamp column.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.288Z
Params: () Result: Column Returns the current timestamp as a timestamp column. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.288Z
(date-add expr days)
Params: (start: Column, days: Int)
Result: Column
Returns the date that is days days after start
A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
The number of days to add to start, can be negative to subtract days
A date, or null if start was a string that could not be cast to a date
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.295Z
Params: (start: Column, days: Int) Result: Column Returns the date that is days days after start A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS The number of days to add to start, can be negative to subtract days A date, or null if start was a string that could not be cast to a date 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.295Z
(date-diff l-expr r-expr)
Params: (end: Column, start: Column)
Result: Column
Returns the number of days from start to end.
Only considers the date part of the input. For example:
A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
An integer, or null if either end or start were strings that could not be cast to a date. Negative if end is before start
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.304Z
Params: (end: Column, start: Column) Result: Column Returns the number of days from start to end. Only considers the date part of the input. For example: A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS An integer, or null if either end or start were strings that could not be cast to a date. Negative if end is before start 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.304Z
(date-format expr date-fmt)
Params: (dateExpr: Column, format: String)
Result: Column
Converts a date/timestamp/string to a value of string in the format specified by the date format given by the second argument.
See Datetime Patterns for valid date and time format patterns
A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
A pattern dd.MM.yyyy would return a string like 18.03.1993
A string, or null if dateExpr was a string that could not be cast to a timestamp
1.5.0
IllegalArgumentException if the format pattern is invalid
Use specialized functions like year whenever possible as they benefit from a specialized implementation.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.297Z
Params: (dateExpr: Column, format: String) Result: Column Converts a date/timestamp/string to a value of string in the format specified by the date format given by the second argument. See Datetime Patterns for valid date and time format patterns A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS A pattern dd.MM.yyyy would return a string like 18.03.1993 A string, or null if dateExpr was a string that could not be cast to a timestamp 1.5.0 IllegalArgumentException if the format pattern is invalid Use specialized functions like year whenever possible as they benefit from a specialized implementation. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.297Z
(date-sub expr days)
Params: (start: Column, days: Int)
Result: Column
Returns the date that is days days before start
A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
The number of days to subtract from start, can be negative to add days
A date, or null if start was a string that could not be cast to a date
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.300Z
Params: (start: Column, days: Int) Result: Column Returns the date that is days days before start A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS The number of days to subtract from start, can be negative to add days A date, or null if start was a string that could not be cast to a date 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.300Z
(date-trunc fmt expr)
Params: (format: String, timestamp: Column)
Result: Column
Returns timestamp truncated to the unit specified by the format.
For example, date_trunc("year", "2018-11-19 12:01:19") returns 2018-01-01 00:00:00
A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
A timestamp, or null if timestamp was a string that could not be cast to a timestamp or format was an invalid value
2.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.302Z
Params: (format: String, timestamp: Column) Result: Column Returns timestamp truncated to the unit specified by the format. For example, date_trunc("year", "2018-11-19 12:01:19") returns 2018-01-01 00:00:00 A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS A timestamp, or null if timestamp was a string that could not be cast to a timestamp or format was an invalid value 2.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.302Z
(datediff l-expr r-expr)
Params: (end: Column, start: Column)
Result: Column
Returns the number of days from start to end.
Only considers the date part of the input. For example:
A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
An integer, or null if either end or start were strings that could not be cast to a date. Negative if end is before start
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.304Z
Params: (end: Column, start: Column) Result: Column Returns the number of days from start to end. Only considers the date part of the input. For example: A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS An integer, or null if either end or start were strings that could not be cast to a date. Negative if end is before start 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.304Z
(day-of-month expr)
Params: (e: Column)
Result: Column
Extracts the day of the month as an integer from a given date/timestamp/string.
An integer, or null if the input was a string that could not be cast to a date
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.305Z
Params: (e: Column) Result: Column Extracts the day of the month as an integer from a given date/timestamp/string. An integer, or null if the input was a string that could not be cast to a date 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.305Z
(day-of-week expr)
Params: (e: Column)
Result: Column
Extracts the day of the week as an integer from a given date/timestamp/string. Ranges from 1 for a Sunday through to 7 for a Saturday
An integer, or null if the input was a string that could not be cast to a date
2.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.306Z
Params: (e: Column) Result: Column Extracts the day of the week as an integer from a given date/timestamp/string. Ranges from 1 for a Sunday through to 7 for a Saturday An integer, or null if the input was a string that could not be cast to a date 2.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.306Z
(day-of-year expr)
Params: (e: Column)
Result: Column
Extracts the day of the year as an integer from a given date/timestamp/string.
An integer, or null if the input was a string that could not be cast to a date
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.307Z
Params: (e: Column) Result: Column Extracts the day of the year as an integer from a given date/timestamp/string. An integer, or null if the input was a string that could not be cast to a date 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.307Z
(dayofmonth expr)
Params: (e: Column)
Result: Column
Extracts the day of the month as an integer from a given date/timestamp/string.
An integer, or null if the input was a string that could not be cast to a date
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.305Z
Params: (e: Column) Result: Column Extracts the day of the month as an integer from a given date/timestamp/string. An integer, or null if the input was a string that could not be cast to a date 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.305Z
(dayofweek expr)
Params: (e: Column)
Result: Column
Extracts the day of the week as an integer from a given date/timestamp/string. Ranges from 1 for a Sunday through to 7 for a Saturday
An integer, or null if the input was a string that could not be cast to a date
2.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.306Z
Params: (e: Column) Result: Column Extracts the day of the week as an integer from a given date/timestamp/string. Ranges from 1 for a Sunday through to 7 for a Saturday An integer, or null if the input was a string that could not be cast to a date 2.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.306Z
(dayofyear expr)
Params: (e: Column)
Result: Column
Extracts the day of the year as an integer from a given date/timestamp/string.
An integer, or null if the input was a string that could not be cast to a date
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.307Z
Params: (e: Column) Result: Column Extracts the day of the year as an integer from a given date/timestamp/string. An integer, or null if the input was a string that could not be cast to a date 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.307Z
(decode expr charset)
Params: (value: Column, charset: String)
Result: Column
Computes the first argument into a string from a binary using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.309Z
Params: (value: Column, charset: String) Result: Column Computes the first argument into a string from a binary using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.309Z
(degrees expr)
Params: (e: Column)
Result: Column
Converts an angle measured in radians to an approximately equivalent angle measured in degrees.
angle in radians
angle in degrees, as if computed by java.lang.Math.toDegrees
2.1.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.312Z
Params: (e: Column) Result: Column Converts an angle measured in radians to an approximately equivalent angle measured in degrees. angle in radians angle in degrees, as if computed by java.lang.Math.toDegrees 2.1.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.312Z
(dense-rank)
Params: ()
Result: Column
Window function: returns the rank of rows within a window partition, without any gaps.
The difference between rank and dense_rank is that denseRank leaves no gaps in ranking sequence when there are ties. That is, if you were ranking a competition using dense_rank and had three people tie for second place, you would say that all three were in second place and that the next person came in third. Rank would give me sequential numbers, making the person that came in third place (after the ties) would register as coming in fifth.
This is equivalent to the DENSE_RANK function in SQL.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.313Z
Params: () Result: Column Window function: returns the rank of rows within a window partition, without any gaps. The difference between rank and dense_rank is that denseRank leaves no gaps in ranking sequence when there are ties. That is, if you were ranking a competition using dense_rank and had three people tie for second place, you would say that all three were in second place and that the next person came in third. Rank would give me sequential numbers, making the person that came in third place (after the ties) would register as coming in fifth. This is equivalent to the DENSE_RANK function in SQL. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.313Z
(element-at expr value)
Params: (column: Column, value: Any)
Result: Column
Returns element of array at given index in value if column is array. Returns value for the given key in value if column is map.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.318Z
Params: (column: Column, value: Any) Result: Column Returns element of array at given index in value if column is array. Returns value for the given key in value if column is map. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.318Z
(encode expr charset)
Params: (value: Column, charset: String)
Result: Column
Computes the first argument into a binary from a string using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.319Z
Params: (value: Column, charset: String) Result: Column Computes the first argument into a binary from a string using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.319Z
(exists expr predicate)
Params: (column: Column, f: (Column) ⇒ Column)
Result: Column
Returns whether a predicate holds for one or more elements in the array.
the input array column
col => predicate, the Boolean predicate to check the input column
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.322Z
Params: (column: Column, f: (Column) ⇒ Column) Result: Column Returns whether a predicate holds for one or more elements in the array. the input array column col => predicate, the Boolean predicate to check the input column 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.322Z
(exp expr)
Params: (e: Column)
Result: Column
Computes the exponential of the given value.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.324Z
Params: (e: Column) Result: Column Computes the exponential of the given value. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.324Z
(explode expr)
Params: (e: Column)
Result: Column
Creates a new row for each element in the given array or map column. Uses the default column name col for elements in the array and key and value for elements in the map unless specified otherwise.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.325Z
Params: (e: Column) Result: Column Creates a new row for each element in the given array or map column. Uses the default column name col for elements in the array and key and value for elements in the map unless specified otherwise. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.325Z
(explode-outer expr)
Params: (e: Column)
Result: Column
Creates a new row for each element in the given array or map column. Uses the default column name col for elements in the array and key and value for elements in the map unless specified otherwise.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.325Z
Params: (e: Column) Result: Column Creates a new row for each element in the given array or map column. Uses the default column name col for elements in the array and key and value for elements in the map unless specified otherwise. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.325Z
(expm-1 expr)
Params: (e: Column)
Result: Column
Computes the exponential of the given value minus one.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.329Z
Params: (e: Column) Result: Column Computes the exponential of the given value minus one. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.329Z
(expm1 expr)
Params: (e: Column)
Result: Column
Computes the exponential of the given value minus one.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.329Z
Params: (e: Column) Result: Column Computes the exponential of the given value minus one. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.329Z
(expr s)
Params: (expr: String)
Result: Column
Parses the expression string into the column that it represents, similar to Dataset#selectExpr.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.330Z
Params: (expr: String) Result: Column Parses the expression string into the column that it represents, similar to Dataset#selectExpr. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.330Z
(factorial expr)
Params: (e: Column)
Result: Column
Computes the factorial of the given value.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.331Z
Params: (e: Column) Result: Column Computes the factorial of the given value. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.331Z
(flatten expr)
Params: (e: Column)
Result: Column
Creates a single array from an array of arrays. If a structure of nested arrays is deeper than two levels, only one level of nesting is removed.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.345Z
Params: (e: Column) Result: Column Creates a single array from an array of arrays. If a structure of nested arrays is deeper than two levels, only one level of nesting is removed. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.345Z
(floor expr)
Params: (e: Column)
Result: Column
Computes the floor of the given value.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.347Z
Params: (e: Column) Result: Column Computes the floor of the given value. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.347Z
(forall expr predicate)
Params: (column: Column, f: (Column) ⇒ Column)
Result: Column
Returns whether a predicate holds for every element in the array.
the input array column
col => predicate, the Boolean predicate to check the input column
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.349Z
Params: (column: Column, f: (Column) ⇒ Column) Result: Column Returns whether a predicate holds for every element in the array. the input array column col => predicate, the Boolean predicate to check the input column 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.349Z
(format-number expr decimal-places)
Params: (x: Column, d: Int)
Result: Column
Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places with HALF_EVEN round mode, and returns the result as a string column.
If d is 0, the result has no decimal point or fractional part. If d is less than 0, the result will be null.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.350Z
Params: (x: Column, d: Int) Result: Column Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places with HALF_EVEN round mode, and returns the result as a string column. If d is 0, the result has no decimal point or fractional part. If d is less than 0, the result will be null. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.350Z
(format-string fmt & exprs)
Params: (format: String, arguments: Column*)
Result: Column
Formats the arguments in printf-style and returns the result as a string column.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.351Z
Params: (format: String, arguments: Column*) Result: Column Formats the arguments in printf-style and returns the result as a string column. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.351Z
(from-csv expr schema)
(from-csv expr schema options)
Params: (e: Column, schema: StructType, options: Map[String, String])
Result: Column
Parses a column containing a CSV string into a StructType with the specified schema. Returns null, in the case of an unparseable string.
a string column containing CSV data.
the schema to use when parsing the CSV string
options to control how the CSV is parsed. accepts the same options and the CSV data source.
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.354Z
Params: (e: Column, schema: StructType, options: Map[String, String]) Result: Column Parses a column containing a CSV string into a StructType with the specified schema. Returns null, in the case of an unparseable string. a string column containing CSV data. the schema to use when parsing the CSV string options to control how the CSV is parsed. accepts the same options and the CSV data source. 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.354Z
(from-json expr schema)
(from-json expr schema options)
Params: (e: Column, schema: StructType, options: Map[String, String])
Result: Column
(Scala-specific) Parses a column containing a JSON string into a StructType with the specified schema. Returns null, in the case of an unparseable string.
a string column containing JSON data.
the schema to use when parsing the json string
options to control how the json is parsed. Accepts the same options as the json data source.
2.1.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.372Z
Params: (e: Column, schema: StructType, options: Map[String, String]) Result: Column (Scala-specific) Parses a column containing a JSON string into a StructType with the specified schema. Returns null, in the case of an unparseable string. a string column containing JSON data. the schema to use when parsing the json string options to control how the json is parsed. Accepts the same options as the json data source. 2.1.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.372Z
(from-unixtime expr)
(from-unixtime expr fmt)
Params: (ut: Column)
Result: Column
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the yyyy-MM-dd HH:mm:ss format.
A number of a type that is castable to a long, such as string or integer. Can be negative for timestamps before the unix epoch
A string, or null if the input was a string that could not be cast to a long
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.375Z
Params: (ut: Column) Result: Column Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the yyyy-MM-dd HH:mm:ss format. A number of a type that is castable to a long, such as string or integer. Can be negative for timestamps before the unix epoch A string, or null if the input was a string that could not be cast to a long 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.375Z
(greatest & exprs)
Params: (exprs: Column*)
Result: Column
Returns the greatest value of the list of values, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.382Z
Params: (exprs: Column*) Result: Column Returns the greatest value of the list of values, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.382Z
(grouping expr)
Params: (e: Column)
Result: Column
Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated or not, returns 1 for aggregated or 0 for not aggregated in the result set.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.388Z
Params: (e: Column) Result: Column Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated or not, returns 1 for aggregated or 0 for not aggregated in the result set. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.388Z
(grouping-id & exprs)
Params: (cols: Column*)
Result: Column
Aggregate function: returns the level of grouping, equals to
2.0.0
The list of columns should match with grouping columns exactly, or empty (means all the grouping columns).
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.390Z
Params: (cols: Column*) Result: Column Aggregate function: returns the level of grouping, equals to 2.0.0 The list of columns should match with grouping columns exactly, or empty (means all the grouping columns). Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.390Z
(hash & exprs)
Params: (cols: Column*)
Result: Column
Calculates the hash code of given columns, and returns the result as an int column.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.391Z
Params: (cols: Column*) Result: Column Calculates the hash code of given columns, and returns the result as an int column. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.391Z
(hex expr)
Params: (column: Column)
Result: Column
Computes hex value of the given column.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.393Z
Params: (column: Column) Result: Column Computes hex value of the given column. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.393Z
(hour expr)
Params: (e: Column)
Result: Column
Extracts the hours as an integer from a given date/timestamp/string.
An integer, or null if the input was a string that could not be cast to a date
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.394Z
Params: (e: Column) Result: Column Extracts the hours as an integer from a given date/timestamp/string. An integer, or null if the input was a string that could not be cast to a date 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.394Z
(hypot left-expr right-expr)
Params: (l: Column, r: Column)
Result: Column
Computes sqrt(a2 + b2) without intermediate overflow or underflow.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.406Z
Params: (l: Column, r: Column) Result: Column Computes sqrt(a2 + b2) without intermediate overflow or underflow. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.406Z
(initcap expr)
Params: (e: Column)
Result: Column
Returns a new string column by converting the first letter of each word to uppercase. Words are delimited by whitespace.
For example, "hello world" will become "Hello World".
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.407Z
Params: (e: Column) Result: Column Returns a new string column by converting the first letter of each word to uppercase. Words are delimited by whitespace. For example, "hello world" will become "Hello World". 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.407Z
(input-file-name)
Params: ()
Result: Column
Creates a string column for the file name of the current Spark task.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.408Z
Params: () Result: Column Creates a string column for the file name of the current Spark task. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.408Z
(instr expr substr)
Params: (str: Column, substring: String)
Result: Column
Locate the position of the first occurrence of substr column in the given string. Returns null if either of the arguments are null.
1.5.0
The position is not zero based, but 1 based index. Returns 0 if substr could not be found in str.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.409Z
Params: (str: Column, substring: String) Result: Column Locate the position of the first occurrence of substr column in the given string. Returns null if either of the arguments are null. 1.5.0 The position is not zero based, but 1 based index. Returns 0 if substr could not be found in str. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.409Z
(kurtosis expr)
Params: (e: Column)
Result: Column
Aggregate function: returns the kurtosis of the values in a group.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.416Z
Params: (e: Column) Result: Column Aggregate function: returns the kurtosis of the values in a group. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.416Z
(lag expr offset)
(lag expr offset default)
Params: (e: Column, offset: Int)
Result: Column
Window function: returns the value that is offset rows before the current row, and null if there is less than offset rows before the current row. For example, an offset of one will return the previous row at any given point in the window partition.
This is equivalent to the LAG function in SQL.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.421Z
Params: (e: Column, offset: Int) Result: Column Window function: returns the value that is offset rows before the current row, and null if there is less than offset rows before the current row. For example, an offset of one will return the previous row at any given point in the window partition. This is equivalent to the LAG function in SQL. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.421Z
(last-day expr)
Params: (e: Column)
Result: Column
Returns the last day of the month which the given date belongs to. For example, input "2015-07-27" returns "2015-07-31" since July 31 is the last day of the month in July 2015.
A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
A date, or null if the input was a string that could not be cast to a date
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.431Z
Params: (e: Column) Result: Column Returns the last day of the month which the given date belongs to. For example, input "2015-07-27" returns "2015-07-31" since July 31 is the last day of the month in July 2015. A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS A date, or null if the input was a string that could not be cast to a date 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.431Z
(lead expr offset)
(lead expr offset default)
Params: (columnName: String, offset: Int)
Result: Column
Window function: returns the value that is offset rows after the current row, and null if there is less than offset rows after the current row. For example, an offset of one will return the next row at any given point in the window partition.
This is equivalent to the LEAD function in SQL.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.437Z
Params: (columnName: String, offset: Int) Result: Column Window function: returns the value that is offset rows after the current row, and null if there is less than offset rows after the current row. For example, an offset of one will return the next row at any given point in the window partition. This is equivalent to the LEAD function in SQL. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.437Z
(least & exprs)
Params: (exprs: Column*)
Result: Column
Returns the least value of the list of values, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.439Z
Params: (exprs: Column*) Result: Column Returns the least value of the list of values, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.439Z
(length expr)
Params: (e: Column)
Result: Column
Computes the character length of a given string or number of bytes of a binary string. The length of character strings include the trailing spaces. The length of binary strings includes binary zeros.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.440Z
Params: (e: Column) Result: Column Computes the character length of a given string or number of bytes of a binary string. The length of character strings include the trailing spaces. The length of binary strings includes binary zeros. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.440Z
(levenshtein left-expr right-expr)
Params: (l: Column, r: Column)
Result: Column
Computes the Levenshtein distance of the two given string columns.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.441Z
Params: (l: Column, r: Column) Result: Column Computes the Levenshtein distance of the two given string columns. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.441Z
(locate substr expr)
Params: (substr: String, str: Column)
Result: Column
Locate the position of the first occurrence of substr.
1.5.0
The position is not zero based, but 1 based index. Returns 0 if substr could not be found in str.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.445Z
Params: (substr: String, str: Column) Result: Column Locate the position of the first occurrence of substr. 1.5.0 The position is not zero based, but 1 based index. Returns 0 if substr could not be found in str. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.445Z
(log expr)
Params: (e: Column)
Result: Column
Computes the natural logarithm of the given value.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.449Z
Params: (e: Column) Result: Column Computes the natural logarithm of the given value. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.449Z
(log-10 expr)
Params: (e: Column)
Result: Column
Computes the logarithm of the given value in base 10.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.451Z
Params: (e: Column) Result: Column Computes the logarithm of the given value in base 10. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.451Z
(log-1p expr)
Params: (e: Column)
Result: Column
Computes the natural logarithm of the given value plus one.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.453Z
Params: (e: Column) Result: Column Computes the natural logarithm of the given value plus one. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.453Z
(log-2 expr)
Params: (expr: Column)
Result: Column
Computes the logarithm of the given column in base 2.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.455Z
Params: (expr: Column) Result: Column Computes the logarithm of the given column in base 2. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.455Z
(log10 expr)
Params: (e: Column)
Result: Column
Computes the logarithm of the given value in base 10.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.451Z
Params: (e: Column) Result: Column Computes the logarithm of the given value in base 10. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.451Z
(log1p expr)
Params: (e: Column)
Result: Column
Computes the natural logarithm of the given value plus one.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.453Z
Params: (e: Column) Result: Column Computes the natural logarithm of the given value plus one. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.453Z
(log2 expr)
Params: (expr: Column)
Result: Column
Computes the logarithm of the given column in base 2.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.455Z
Params: (expr: Column) Result: Column Computes the logarithm of the given column in base 2. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.455Z
(lower expr)
Params: (e: Column)
Result: Column
Converts a string column to lower case.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.457Z
Params: (e: Column) Result: Column Converts a string column to lower case. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.457Z
(lpad expr length pad)
Params: (str: Column, len: Int, pad: String)
Result: Column
Left-pad the string column with pad to a length of len. If the string column is longer than len, the return value is shortened to len characters.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.458Z
Params: (str: Column, len: Int, pad: String) Result: Column Left-pad the string column with pad to a length of len. If the string column is longer than len, the return value is shortened to len characters. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.458Z
(ltrim expr)
Params: (e: Column)
Result: Column
Trim the spaces from left end for the specified string value.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.460Z
Params: (e: Column) Result: Column Trim the spaces from left end for the specified string value. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.460Z
(map & exprs)
Params: (cols: Column*)
Result: Column
Creates a new map column. The input columns must be grouped as key-value pairs, e.g. (key1, value1, key2, value2, ...). The key columns must all have the same data type, and can't be null. The value columns must all have the same data type.
2.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.461Z
Params: (cols: Column*) Result: Column Creates a new map column. The input columns must be grouped as key-value pairs, e.g. (key1, value1, key2, value2, ...). The key columns must all have the same data type, and can't be null. The value columns must all have the same data type. 2.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.461Z
(map-concat & exprs)
Params: (cols: Column*)
Result: Column
Returns the union of all the given maps.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.462Z
Params: (cols: Column*) Result: Column Returns the union of all the given maps. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.462Z
(map-entries expr)
Params: (e: Column)
Result: Column
Returns an unordered array of all entries in the given map.
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.463Z
Params: (e: Column) Result: Column Returns an unordered array of all entries in the given map. 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.463Z
(map-filter expr predicate)
Params: (expr: Column, f: (Column, Column) ⇒ Column)
Result: Column
Returns a map whose key-value pairs satisfy a predicate.
the input map column
(key, value) => predicate, the Boolean predicate to filter the input map column
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.465Z
Params: (expr: Column, f: (Column, Column) ⇒ Column) Result: Column Returns a map whose key-value pairs satisfy a predicate. the input map column (key, value) => predicate, the Boolean predicate to filter the input map column 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.465Z
(map-from-arrays key-expr val-expr)
Params: (keys: Column, values: Column)
Result: Column
Creates a new map column. The array in the first column is used for keys. The array in the second column is used for values. All elements in the array for key should not be null.
2.4
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.470Z
Params: (keys: Column, values: Column) Result: Column Creates a new map column. The array in the first column is used for keys. The array in the second column is used for values. All elements in the array for key should not be null. 2.4 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.470Z
(map-from-entries expr)
Params: (e: Column)
Result: Column
Returns a map created from the given array of entries.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.471Z
Params: (e: Column) Result: Column Returns a map created from the given array of entries. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.471Z
(map-keys expr)
Params: (e: Column)
Result: Column
Returns an unordered array containing the keys of the map.
2.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.472Z
Params: (e: Column) Result: Column Returns an unordered array containing the keys of the map. 2.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.472Z
(map-values expr)
Params: (e: Column)
Result: Column
Returns an unordered array containing the values of the map.
2.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.473Z
Params: (e: Column) Result: Column Returns an unordered array containing the values of the map. 2.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.473Z
(map-zip-with left right merge-fn)
Params: (left: Column, right: Column, f: (Column, Column, Column) ⇒ Column)
Result: Column
Merge two given maps, key-wise into a single map using a function.
the left input map column
the right input map column
(key, value1, value2) => new_value, the lambda function to merge the map values
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.474Z
Params: (left: Column, right: Column, f: (Column, Column, Column) ⇒ Column) Result: Column Merge two given maps, key-wise into a single map using a function. the left input map column the right input map column (key, value1, value2) => new_value, the lambda function to merge the map values 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.474Z
(md-5 expr)
Params: (e: Column)
Result: Column
Calculates the MD5 digest of a binary column and returns the value as a 32 character hex string.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.478Z
Params: (e: Column) Result: Column Calculates the MD5 digest of a binary column and returns the value as a 32 character hex string. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.478Z
(md5 expr)
Params: (e: Column)
Result: Column
Calculates the MD5 digest of a binary column and returns the value as a 32 character hex string.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.478Z
Params: (e: Column) Result: Column Calculates the MD5 digest of a binary column and returns the value as a 32 character hex string. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.478Z
(minute expr)
Params: (e: Column)
Result: Column
Extracts the minutes as an integer from a given date/timestamp/string.
An integer, or null if the input was a string that could not be cast to a date
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.483Z
Params: (e: Column) Result: Column Extracts the minutes as an integer from a given date/timestamp/string. An integer, or null if the input was a string that could not be cast to a date 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.483Z
(monotonically-increasing-id)
Params: ()
Result: Column
A column expression that generates monotonically increasing 64-bit integers.
The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the record number within each partition in the lower 33 bits. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records.
As an example, consider a DataFrame with two partitions, each with 3 records. This expression would return the following IDs:
(Since version 2.0.0) Use monotonically_increasing_id()
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.744Z
Params: () Result: Column A column expression that generates monotonically increasing 64-bit integers. The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the record number within each partition in the lower 33 bits. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records. As an example, consider a DataFrame with two partitions, each with 3 records. This expression would return the following IDs: (Since version 2.0.0) Use monotonically_increasing_id() 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.744Z
(month expr)
Params: (e: Column)
Result: Column
Extracts the month as an integer from a given date/timestamp/string.
An integer, or null if the input was a string that could not be cast to a date
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.486Z
Params: (e: Column) Result: Column Extracts the month as an integer from a given date/timestamp/string. An integer, or null if the input was a string that could not be cast to a date 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.486Z
(months-between l-expr r-expr)
Params: (end: Column, start: Column)
Result: Column
Returns number of months between dates start and end.
A whole number is returned if both inputs have the same day of month or both are the last day of their respective months. Otherwise, the difference is calculated assuming 31 days per month.
For example:
A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
A date, timestamp or string. If a string, the data must be in a format that can cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
A double, or null if either end or start were strings that could not be cast to a timestamp. Negative if end is before start
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.490Z
Params: (end: Column, start: Column) Result: Column Returns number of months between dates start and end. A whole number is returned if both inputs have the same day of month or both are the last day of their respective months. Otherwise, the difference is calculated assuming 31 days per month. For example: A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS A date, timestamp or string. If a string, the data must be in a format that can cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS A double, or null if either end or start were strings that could not be cast to a timestamp. Negative if end is before start 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.490Z
(nanvl left-expr right-expr)
Params: (col1: Column, col2: Column)
Result: Column
Returns col1 if it is not NaN, or col2 if col1 is NaN.
Both inputs should be floating point columns (DoubleType or FloatType).
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.492Z
Params: (col1: Column, col2: Column) Result: Column Returns col1 if it is not NaN, or col2 if col1 is NaN. Both inputs should be floating point columns (DoubleType or FloatType). 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.492Z
(negate expr)
Params: (e: Column)
Result: Column
Unary minus, i.e. negate the expression.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.494Z
Params: (e: Column) Result: Column Unary minus, i.e. negate the expression. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.494Z
(next-day expr day-of-week)
Params: (date: Column, dayOfWeek: String)
Result: Column
Returns the first date which is later than the value of the date column that is on the specified day of the week.
For example, next_day('2015-07-27', "Sunday") returns 2015-08-02 because that is the first Sunday after 2015-07-27.
A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
Case insensitive, and accepts: "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"
A date, or null if date was a string that could not be cast to a date or if dayOfWeek was an invalid value
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.495Z
Params: (date: Column, dayOfWeek: String) Result: Column Returns the first date which is later than the value of the date column that is on the specified day of the week. For example, next_day('2015-07-27', "Sunday") returns 2015-08-02 because that is the first Sunday after 2015-07-27. A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS Case insensitive, and accepts: "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun" A date, or null if date was a string that could not be cast to a date or if dayOfWeek was an invalid value 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.495Z
(not expr)
Params: (e: Column)
Result: Column
Inversion of boolean expression, i.e. NOT.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.497Z
Params: (e: Column) Result: Column Inversion of boolean expression, i.e. NOT. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.497Z
(ntile n)
Params: (n: Int)
Result: Column
Window function: returns the ntile group id (from 1 to n inclusive) in an ordered window partition. For example, if n is 4, the first quarter of the rows will get value 1, the second quarter will get 2, the third quarter will get 3, and the last quarter will get 4.
This is equivalent to the NTILE function in SQL.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.500Z
Params: (n: Int) Result: Column Window function: returns the ntile group id (from 1 to n inclusive) in an ordered window partition. For example, if n is 4, the first quarter of the rows will get value 1, the second quarter will get 2, the third quarter will get 3, and the last quarter will get 4. This is equivalent to the NTILE function in SQL. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.500Z
(overlay src rep pos)
(overlay src rep pos len)
Params: (src: Column, replace: Column, pos: Column, len: Column)
Result: Column
Overlay the specified portion of src with replace, starting from byte position pos of src and proceeding for len bytes.
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.503Z
Params: (src: Column, replace: Column, pos: Column, len: Column) Result: Column Overlay the specified portion of src with replace, starting from byte position pos of src and proceeding for len bytes. 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.503Z
(percent-rank)
Params: ()
Result: Column
Window function: returns the relative rank (i.e. percentile) of rows within a window partition.
This is computed by:
This is equivalent to the PERCENT_RANK function in SQL.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.504Z
Params: () Result: Column Window function: returns the relative rank (i.e. percentile) of rows within a window partition. This is computed by: This is equivalent to the PERCENT_RANK function in SQL. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.504Z
The double value that is closer than any other to pi, the ratio of the circumference of a circle to its diameter.
The double value that is closer than any other to pi, the ratio of the circumference of a circle to its diameter.
(pmod left-expr right-expr)
Params: (dividend: Column, divisor: Column)
Result: Column
Returns the positive value of dividend mod divisor.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.505Z
Params: (dividend: Column, divisor: Column) Result: Column Returns the positive value of dividend mod divisor. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.505Z
(posexplode expr)
Params: (e: Column)
Result: Column
Creates a new row for each element with position in the given array or map column. Uses the default column name pos for position, and col for elements in the array and key and value for elements in the map unless specified otherwise.
2.1.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.506Z
Params: (e: Column) Result: Column Creates a new row for each element with position in the given array or map column. Uses the default column name pos for position, and col for elements in the array and key and value for elements in the map unless specified otherwise. 2.1.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.506Z
(posexplode-outer expr)
Params: (e: Column)
Result: Column
Creates a new row for each element with position in the given array or map column. Uses the default column name pos for position, and col for elements in the array and key and value for elements in the map unless specified otherwise.
2.1.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.506Z
Params: (e: Column) Result: Column Creates a new row for each element with position in the given array or map column. Uses the default column name pos for position, and col for elements in the array and key and value for elements in the map unless specified otherwise. 2.1.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.506Z
(pow base exponent)
Params: (l: Column, r: Column)
Result: Column
Returns the value of the first argument raised to the power of the second argument.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.520Z
Params: (l: Column, r: Column) Result: Column Returns the value of the first argument raised to the power of the second argument. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.520Z
(quarter expr)
Params: (e: Column)
Result: Column
Extracts the quarter as an integer from a given date/timestamp/string.
An integer, or null if the input was a string that could not be cast to a date
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.521Z
Params: (e: Column) Result: Column Extracts the quarter as an integer from a given date/timestamp/string. An integer, or null if the input was a string that could not be cast to a date 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.521Z
(radians expr)
Params: (e: Column)
Result: Column
Converts an angle measured in degrees to an approximately equivalent angle measured in radians.
angle in degrees
angle in radians, as if computed by java.lang.Math.toRadians
2.1.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.523Z
Params: (e: Column) Result: Column Converts an angle measured in degrees to an approximately equivalent angle measured in radians. angle in degrees angle in radians, as if computed by java.lang.Math.toRadians 2.1.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.523Z
(rand)
(rand seed)
Params: (seed: Long)
Result: Column
Generate a random column with independent and identically distributed (i.i.d.) samples uniformly distributed in [0.0, 1.0).
1.4.0
The function is non-deterministic in general case.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.526Z
Params: (seed: Long) Result: Column Generate a random column with independent and identically distributed (i.i.d.) samples uniformly distributed in [0.0, 1.0). 1.4.0 The function is non-deterministic in general case. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.526Z
(randn)
(randn seed)
Params: (seed: Long)
Result: Column
Generate a column with independent and identically distributed (i.i.d.) samples from the standard normal distribution.
1.4.0
The function is non-deterministic in general case.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.528Z
Params: (seed: Long) Result: Column Generate a column with independent and identically distributed (i.i.d.) samples from the standard normal distribution. 1.4.0 The function is non-deterministic in general case. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.528Z
(rank)
Params: ()
Result: Column
Window function: returns the rank of rows within a window partition.
The difference between rank and dense_rank is that dense_rank leaves no gaps in ranking sequence when there are ties. That is, if you were ranking a competition using dense_rank and had three people tie for second place, you would say that all three were in second place and that the next person came in third. Rank would give me sequential numbers, making the person that came in third place (after the ties) would register as coming in fifth.
This is equivalent to the RANK function in SQL.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.529Z
Params: () Result: Column Window function: returns the rank of rows within a window partition. The difference between rank and dense_rank is that dense_rank leaves no gaps in ranking sequence when there are ties. That is, if you were ranking a competition using dense_rank and had three people tie for second place, you would say that all three were in second place and that the next person came in third. Rank would give me sequential numbers, making the person that came in third place (after the ties) would register as coming in fifth. This is equivalent to the RANK function in SQL. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.529Z
(regexp-extract expr regex idx)
Params: (e: Column, exp: String, groupIdx: Int)
Result: Column
Extract a specific group matched by a Java regex, from the specified string column. If the regex did not match, or the specified group did not match, an empty string is returned.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.530Z
Params: (e: Column, exp: String, groupIdx: Int) Result: Column Extract a specific group matched by a Java regex, from the specified string column. If the regex did not match, or the specified group did not match, an empty string is returned. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.530Z
(regexp-replace expr pattern-expr replacement-expr)
Params: (e: Column, pattern: String, replacement: String)
Result: Column
Replace all substrings of the specified string value that match regexp with rep.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.532Z
Params: (e: Column, pattern: String, replacement: String) Result: Column Replace all substrings of the specified string value that match regexp with rep. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.532Z
(reverse expr)
Params: (e: Column)
Result: Column
Returns a reversed string or an array with reverse order of elements.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.534Z
Params: (e: Column) Result: Column Returns a reversed string or an array with reverse order of elements. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.534Z
(rint expr)
Params: (e: Column)
Result: Column
Returns the double value that is closest in value to the argument and is equal to a mathematical integer.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.536Z
Params: (e: Column) Result: Column Returns the double value that is closest in value to the argument and is equal to a mathematical integer. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.536Z
(round expr)
Params: (e: Column)
Result: Column
Returns the value of the column e rounded to 0 decimal places with HALF_UP round mode.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.539Z
Params: (e: Column) Result: Column Returns the value of the column e rounded to 0 decimal places with HALF_UP round mode. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.539Z
(row-number)
Params: ()
Result: Column
Window function: returns a sequential number starting at 1 within a window partition.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.540Z
Params: () Result: Column Window function: returns a sequential number starting at 1 within a window partition. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.540Z
(rpad expr length pad)
Params: (str: Column, len: Int, pad: String)
Result: Column
Right-pad the string column with pad to a length of len. If the string column is longer than len, the return value is shortened to len characters.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.541Z
Params: (str: Column, len: Int, pad: String) Result: Column Right-pad the string column with pad to a length of len. If the string column is longer than len, the return value is shortened to len characters. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.541Z
(rtrim expr)
Params: (e: Column)
Result: Column
Trim the spaces from right end for the specified string value.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.543Z
Params: (e: Column) Result: Column Trim the spaces from right end for the specified string value. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.543Z
(schema-of-csv expr)
(schema-of-csv expr options)
Params: (csv: String)
Result: Column
Parses a CSV string and infers its schema in DDL format.
a CSV string.
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.547Z
Params: (csv: String) Result: Column Parses a CSV string and infers its schema in DDL format. a CSV string. 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.547Z
(schema-of-json expr)
(schema-of-json expr options)
Params: (json: String)
Result: Column
Parses a JSON string and infers its schema in DDL format.
a JSON string.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.554Z
Params: (json: String) Result: Column Parses a JSON string and infers its schema in DDL format. a JSON string. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.554Z
(second expr)
Params: (e: Column)
Result: Column
Extracts the seconds as an integer from a given date/timestamp/string.
An integer, or null if the input was a string that could not be cast to a timestamp
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.555Z
Params: (e: Column) Result: Column Extracts the seconds as an integer from a given date/timestamp/string. An integer, or null if the input was a string that could not be cast to a timestamp 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.555Z
(sequence start stop step)
Params: (start: Column, stop: Column, step: Column)
Result: Column
Generate a sequence of integers from start to stop, incrementing by step.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.557Z
Params: (start: Column, stop: Column, step: Column) Result: Column Generate a sequence of integers from start to stop, incrementing by step. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.557Z
(sha-1 expr)
Params: (e: Column)
Result: Column
Calculates the SHA-1 digest of a binary column and returns the value as a 40 character hex string.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.558Z
Params: (e: Column) Result: Column Calculates the SHA-1 digest of a binary column and returns the value as a 40 character hex string. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.558Z
(sha-2 expr n-bits)
Params: (e: Column, numBits: Int)
Result: Column
Calculates the SHA-2 family of hash functions of a binary column and returns the value as a hex string.
column to compute SHA-2 on.
one of 224, 256, 384, or 512.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.559Z
Params: (e: Column, numBits: Int) Result: Column Calculates the SHA-2 family of hash functions of a binary column and returns the value as a hex string. column to compute SHA-2 on. one of 224, 256, 384, or 512. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.559Z
(sha1 expr)
Params: (e: Column)
Result: Column
Calculates the SHA-1 digest of a binary column and returns the value as a 40 character hex string.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.558Z
Params: (e: Column) Result: Column Calculates the SHA-1 digest of a binary column and returns the value as a 40 character hex string. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.558Z
(sha2 expr n-bits)
Params: (e: Column, numBits: Int)
Result: Column
Calculates the SHA-2 family of hash functions of a binary column and returns the value as a hex string.
column to compute SHA-2 on.
one of 224, 256, 384, or 512.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.559Z
Params: (e: Column, numBits: Int) Result: Column Calculates the SHA-2 family of hash functions of a binary column and returns the value as a hex string. column to compute SHA-2 on. one of 224, 256, 384, or 512. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.559Z
(shift-left expr num-bits)
Params: (e: Column, numBits: Int)
Result: Column
Shift the given value numBits left. If the given value is a long value, this function will return a long value else it will return an integer value.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.560Z
Params: (e: Column, numBits: Int) Result: Column Shift the given value numBits left. If the given value is a long value, this function will return a long value else it will return an integer value. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.560Z
(shift-right expr num-bits)
Params: (e: Column, numBits: Int)
Result: Column
(Signed) shift the given value numBits right. If the given value is a long value, it will return a long value else it will return an integer value.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.562Z
Params: (e: Column, numBits: Int) Result: Column (Signed) shift the given value numBits right. If the given value is a long value, it will return a long value else it will return an integer value. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.562Z
(shift-right-unsigned expr num-bits)
Params: (e: Column, numBits: Int)
Result: Column
Unsigned shift the given value numBits right. If the given value is a long value, it will return a long value else it will return an integer value.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.563Z
Params: (e: Column, numBits: Int) Result: Column Unsigned shift the given value numBits right. If the given value is a long value, it will return a long value else it will return an integer value. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.563Z
(sign expr)
Params: (e: Column)
Result: Column
Computes the signum of the given value.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.566Z
Params: (e: Column) Result: Column Computes the signum of the given value. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.566Z
(signum expr)
Params: (e: Column)
Result: Column
Computes the signum of the given value.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.566Z
Params: (e: Column) Result: Column Computes the signum of the given value. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.566Z
(sin expr)
Params: (e: Column)
Result: Column
angle in radians
sine of the angle, as if computed by java.lang.Math.sin
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.568Z
Params: (e: Column) Result: Column angle in radians sine of the angle, as if computed by java.lang.Math.sin 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.568Z
(sinh expr)
Params: (e: Column)
Result: Column
hyperbolic angle
hyperbolic sine of the given value, as if computed by java.lang.Math.sinh
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.570Z
Params: (e: Column) Result: Column hyperbolic angle hyperbolic sine of the given value, as if computed by java.lang.Math.sinh 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.570Z
(size expr)
Params: (e: Column)
Result: Column
Returns length of array or map.
The function returns null for null input if spark.sql.legacy.sizeOfNull is set to false or spark.sql.ansi.enabled is set to true. Otherwise, the function returns -1 for null input. With the default settings, the function returns -1 for null input.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.571Z
Params: (e: Column) Result: Column Returns length of array or map. The function returns null for null input if spark.sql.legacy.sizeOfNull is set to false or spark.sql.ansi.enabled is set to true. Otherwise, the function returns -1 for null input. With the default settings, the function returns -1 for null input. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.571Z
(skewness expr)
Params: (e: Column)
Result: Column
Aggregate function: returns the skewness of the values in a group.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.574Z
Params: (e: Column) Result: Column Aggregate function: returns the skewness of the values in a group. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.574Z
(slice expr start length)
Params: (x: Column, start: Int, length: Int)
Result: Column
Returns an array containing all the elements in x from index start (or starting from the end if start is negative) with the specified length.
the array column to be sliced
the starting index
the length of the slice
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.575Z
Params: (x: Column, start: Int, length: Int) Result: Column Returns an array containing all the elements in x from index start (or starting from the end if start is negative) with the specified length. the array column to be sliced the starting index the length of the slice 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.575Z
(sort-array expr)
(sort-array expr asc)
Params: (e: Column)
Result: Column
Sorts the input array for the given column in ascending order, according to the natural ordering of the array elements. Null elements will be placed at the beginning of the returned array.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.577Z
Params: (e: Column) Result: Column Sorts the input array for the given column in ascending order, according to the natural ordering of the array elements. Null elements will be placed at the beginning of the returned array. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.577Z
(soundex expr)
Params: (e: Column)
Result: Column
Returns the soundex code for the specified expression.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.578Z
Params: (e: Column) Result: Column Returns the soundex code for the specified expression. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.578Z
(spark-partition-id)
Params: ()
Result: Column
Partition ID.
1.6.0
This is non-deterministic because it depends on data partitioning and task scheduling.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.579Z
Params: () Result: Column Partition ID. 1.6.0 This is non-deterministic because it depends on data partitioning and task scheduling. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.579Z
(split expr pattern)
Params: (str: Column, pattern: String)
Result: Column
Splits str around matches of the given pattern.
a string expression to split
a string representing a regular expression. The regex string should be a Java regular expression.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.582Z
Params: (str: Column, pattern: String) Result: Column Splits str around matches of the given pattern. a string expression to split a string representing a regular expression. The regex string should be a Java regular expression. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.582Z
(sqr expr)
Returns the value of the first argument raised to the power of two.
Returns the value of the first argument raised to the power of two.
(sqrt expr)
Params: (e: Column)
Result: Column
Computes the square root of the specified float value.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.584Z
Params: (e: Column) Result: Column Computes the square root of the specified float value. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.584Z
(std expr)
Params: (e: Column)
Result: Column
Aggregate function: alias for stddev_samp.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.586Z
Params: (e: Column) Result: Column Aggregate function: alias for stddev_samp. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.586Z
(stddev expr)
Params: (e: Column)
Result: Column
Aggregate function: alias for stddev_samp.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.586Z
Params: (e: Column) Result: Column Aggregate function: alias for stddev_samp. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.586Z
(stddev-pop expr)
Params: (e: Column)
Result: Column
Aggregate function: returns the population standard deviation of the expression in a group.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.593Z
Params: (e: Column) Result: Column Aggregate function: returns the population standard deviation of the expression in a group. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.593Z
(stddev-samp expr)
Params: (e: Column)
Result: Column
Aggregate function: alias for stddev_samp.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.586Z
Params: (e: Column) Result: Column Aggregate function: alias for stddev_samp. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.586Z
(struct & exprs)
Params: (cols: Column*)
Result: Column
Creates a new struct column. If the input column is a column in a DataFrame, or a derived column expression that is named (i.e. aliased), its name would be retained as the StructField's name, otherwise, the newly generated StructField's name would be auto generated as col with a suffix index + 1, i.e. col1, col2, col3, ...
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.597Z
Params: (cols: Column*) Result: Column Creates a new struct column. If the input column is a column in a DataFrame, or a derived column expression that is named (i.e. aliased), its name would be retained as the StructField's name, otherwise, the newly generated StructField's name would be auto generated as col with a suffix index + 1, i.e. col1, col2, col3, ... 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.597Z
(substring expr pos len)
Params: (str: Column, pos: Int, len: Int)
Result: Column
Substring starts at pos and is of length len when str is String type or returns the slice of byte array that starts at pos in byte and is of length len when str is Binary type
1.5.0
The position is not zero based, but 1 based index.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.599Z
Params: (str: Column, pos: Int, len: Int) Result: Column Substring starts at pos and is of length len when str is String type or returns the slice of byte array that starts at pos in byte and is of length len when str is Binary type 1.5.0 The position is not zero based, but 1 based index. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.599Z
(substring-index expr delim cnt)
Params: (str: Column, delim: String, count: Int)
Result: Column
Returns the substring from string str before count occurrences of the delimiter delim. If count is positive, everything the left of the final delimiter (counting from left) is returned. If count is negative, every to the right of the final delimiter (counting from the right) is returned. substring_index performs a case-sensitive match when searching for delim.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.600Z
Params: (str: Column, delim: String, count: Int) Result: Column Returns the substring from string str before count occurrences of the delimiter delim. If count is positive, everything the left of the final delimiter (counting from left) is returned. If count is negative, every to the right of the final delimiter (counting from the right) is returned. substring_index performs a case-sensitive match when searching for delim. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.600Z
(sum-distinct expr)
Params: (e: Column)
Result: Column
Aggregate function: returns the sum of distinct values in the expression.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.604Z
Params: (e: Column) Result: Column Aggregate function: returns the sum of distinct values in the expression. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.604Z
(tan expr)
Params: (e: Column)
Result: Column
angle in radians
tangent of the given value, as if computed by java.lang.Math.tan
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.607Z
Params: (e: Column) Result: Column angle in radians tangent of the given value, as if computed by java.lang.Math.tan 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.607Z
(tanh expr)
Params: (e: Column)
Result: Column
hyperbolic angle
hyperbolic tangent of the given value, as if computed by java.lang.Math.tanh
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.610Z
Params: (e: Column) Result: Column hyperbolic angle hyperbolic tangent of the given value, as if computed by java.lang.Math.tanh 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.610Z
(time-window time-expr duration)
(time-window time-expr duration slide)
(time-window time-expr duration slide start)
Params: (timeColumn: Column, windowDuration: String, slideDuration: String, startTime: String)
Result: Column
Bucketize rows into one or more time windows given a timestamp specifying column. Window starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window [12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in the order of months are not supported. The following example takes the average stock price for a one minute window every 10 seconds starting 5 seconds after the hour:
The windows will look like:
For a streaming query, you may use the function current_timestamp to generate windows on processing time.
The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType.
A string specifying the width of the window, e.g. 10 minutes, 1 second. Check org.apache.spark.unsafe.types.CalendarInterval for valid duration identifiers. Note that the duration is a fixed length of time, and does not vary over time according to a calendar. For example, 1 day always means 86,400,000 milliseconds, not a calendar day.
A string specifying the sliding interval of the window, e.g. 1 minute. A new window will be generated every slideDuration. Must be less than or equal to the windowDuration. Check org.apache.spark.unsafe.types.CalendarInterval for valid duration identifiers. This duration is likewise absolute, and does not vary according to a calendar.
The offset with respect to 1970-01-01 00:00:00 UTC with which to start window intervals. For example, in order to have hourly tumbling windows that start 15 minutes past the hour, e.g. 12:15-13:15, 13:15-14:15... provide startTime as 15 minutes.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.732Z
Params: (timeColumn: Column, windowDuration: String, slideDuration: String, startTime: String) Result: Column Bucketize rows into one or more time windows given a timestamp specifying column. Window starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window [12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in the order of months are not supported. The following example takes the average stock price for a one minute window every 10 seconds starting 5 seconds after the hour: The windows will look like: For a streaming query, you may use the function current_timestamp to generate windows on processing time. The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType. A string specifying the width of the window, e.g. 10 minutes, 1 second. Check org.apache.spark.unsafe.types.CalendarInterval for valid duration identifiers. Note that the duration is a fixed length of time, and does not vary over time according to a calendar. For example, 1 day always means 86,400,000 milliseconds, not a calendar day. A string specifying the sliding interval of the window, e.g. 1 minute. A new window will be generated every slideDuration. Must be less than or equal to the windowDuration. Check org.apache.spark.unsafe.types.CalendarInterval for valid duration identifiers. This duration is likewise absolute, and does not vary according to a calendar. The offset with respect to 1970-01-01 00:00:00 UTC with which to start window intervals. For example, in order to have hourly tumbling windows that start 15 minutes past the hour, e.g. 12:15-13:15, 13:15-14:15... provide startTime as 15 minutes. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.732Z
(to-csv expr)
(to-csv expr options)
Params: (e: Column, options: Map[String, String])
Result: Column
(Java-specific) Converts a column containing a StructType into a CSV string with the specified schema. Throws an exception, in the case of an unsupported type.
a column containing a struct.
options to control how the struct column is converted into a CSV string. It accepts the same options and the json data source.
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.613Z
Params: (e: Column, options: Map[String, String]) Result: Column (Java-specific) Converts a column containing a StructType into a CSV string with the specified schema. Throws an exception, in the case of an unsupported type. a column containing a struct. options to control how the struct column is converted into a CSV string. It accepts the same options and the json data source. 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.613Z
(to-date expr)
(to-date expr date-format)
Params: (e: Column)
Result: Column
Converts the column into DateType by casting rules to DateType.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.616Z
Params: (e: Column) Result: Column Converts the column into DateType by casting rules to DateType. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.616Z
(to-timestamp expr)
(to-timestamp expr date-format)
Params: (s: Column)
Result: Column
Converts to a timestamp by casting rules to TimestampType.
A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
A timestamp, or null if the input was a string that could not be cast to a timestamp
2.2.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.623Z
Params: (s: Column) Result: Column Converts to a timestamp by casting rules to TimestampType. A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS A timestamp, or null if the input was a string that could not be cast to a timestamp 2.2.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.623Z
(to-utc-timestamp expr)
Params: (ts: Column, tz: String)
Result: Column
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'.
A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
A string detailing the time zone ID that the input should be adjusted to. It should be in the format of either region-based zone IDs or zone offsets. Region IDs must have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'. Other short names are not recommended to use because they can be ambiguous.
A timestamp, or null if ts was a string that could not be cast to a timestamp or tz was an invalid value
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.626Z
Params: (ts: Column, tz: String) Result: Column Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'. A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS A string detailing the time zone ID that the input should be adjusted to. It should be in the format of either region-based zone IDs or zone offsets. Region IDs must have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'. Other short names are not recommended to use because they can be ambiguous. A timestamp, or null if ts was a string that could not be cast to a timestamp or tz was an invalid value 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.626Z
(transform expr xform-fn)
Params: (column: Column, f: (Column) ⇒ Column)
Result: Column
Returns an array of elements after applying a transformation to each element in the input array.
the input array column
col => transformed_col, the lambda function to transform the input column
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.629Z
Params: (column: Column, f: (Column) ⇒ Column) Result: Column Returns an array of elements after applying a transformation to each element in the input array. the input array column col => transformed_col, the lambda function to transform the input column 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.629Z
(transform-keys expr key-fn)
Params: (expr: Column, f: (Column, Column) ⇒ Column)
Result: Column
Applies a function to every key-value pair in a map and returns a map with the results of those applications as the new keys for the pairs.
the input map column
(key, value) => new_key, the lambda function to transform the key of input map column
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.630Z
Params: (expr: Column, f: (Column, Column) ⇒ Column) Result: Column Applies a function to every key-value pair in a map and returns a map with the results of those applications as the new keys for the pairs. the input map column (key, value) => new_key, the lambda function to transform the key of input map column 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.630Z
(transform-values expr key-fn)
Params: (expr: Column, f: (Column, Column) ⇒ Column)
Result: Column
Applies a function to every key-value pair in a map and returns a map with the results of those applications as the new values for the pairs.
the input map column
(key, value) => new_value, the lambda function to transform the value of input map column
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.638Z
Params: (expr: Column, f: (Column, Column) ⇒ Column) Result: Column Applies a function to every key-value pair in a map and returns a map with the results of those applications as the new values for the pairs. the input map column (key, value) => new_value, the lambda function to transform the value of input map column 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.638Z
(translate expr match replacement)
Params: (src: Column, matchingString: String, replaceString: String)
Result: Column
Translate any character in the src by a character in replaceString. The characters in replaceString correspond to the characters in matchingString. The translate will happen when any character in the string matches the character in the matchingString.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.639Z
Params: (src: Column, matchingString: String, replaceString: String) Result: Column Translate any character in the src by a character in replaceString. The characters in replaceString correspond to the characters in matchingString. The translate will happen when any character in the string matches the character in the matchingString. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.639Z
(trim expr trim-string)
Params: (e: Column)
Result: Column
Trim the spaces from both ends for the specified string column.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.641Z
Params: (e: Column) Result: Column Trim the spaces from both ends for the specified string column. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.641Z
(unbase-64 expr)
Params: (e: Column)
Result: Column
Decodes a BASE64 encoded string column and returns it as a binary column. This is the reverse of base64.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.702Z
Params: (e: Column) Result: Column Decodes a BASE64 encoded string column and returns it as a binary column. This is the reverse of base64. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.702Z
(unbase64 expr)
Params: (e: Column)
Result: Column
Decodes a BASE64 encoded string column and returns it as a binary column. This is the reverse of base64.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.702Z
Params: (e: Column) Result: Column Decodes a BASE64 encoded string column and returns it as a binary column. This is the reverse of base64. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.702Z
(unhex expr)
Params: (column: Column)
Result: Column
Inverse of hex. Interprets each pair of characters as a hexadecimal number and converts to the byte representation of number.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.703Z
Params: (column: Column) Result: Column Inverse of hex. Interprets each pair of characters as a hexadecimal number and converts to the byte representation of number. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.703Z
(unix-timestamp)
(unix-timestamp expr)
(unix-timestamp expr pattern)
Params: ()
Result: Column
Returns the current Unix timestamp (in seconds) as a long.
1.5.0
All calls of unix_timestamp within the same query return the same value (i.e. the current timestamp is calculated at the start of query evaluation).
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.710Z
Params: () Result: Column Returns the current Unix timestamp (in seconds) as a long. 1.5.0 All calls of unix_timestamp within the same query return the same value (i.e. the current timestamp is calculated at the start of query evaluation). Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.710Z
(upper expr)
Params: (e: Column)
Result: Column
Converts a string column to upper case.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.712Z
Params: (e: Column) Result: Column Converts a string column to upper case. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.712Z
(var-pop expr)
Params: (e: Column)
Result: Column
Aggregate function: returns the population variance of the values in a group.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.714Z
Params: (e: Column) Result: Column Aggregate function: returns the population variance of the values in a group. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.714Z
(var-samp expr)
Params: (e: Column)
Result: Column
Aggregate function: alias for var_samp.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.718Z
Params: (e: Column) Result: Column Aggregate function: alias for var_samp. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.718Z
(variance expr)
Params: (e: Column)
Result: Column
Aggregate function: alias for var_samp.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.718Z
Params: (e: Column) Result: Column Aggregate function: alias for var_samp. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.718Z
(week-of-year expr)
Params: (e: Column)
Result: Column
Extracts the week number as an integer from a given date/timestamp/string.
A week is considered to start on a Monday and week 1 is the first week with more than 3 days, as defined by ISO 8601
An integer, or null if the input was a string that could not be cast to a date
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.723Z
Params: (e: Column) Result: Column Extracts the week number as an integer from a given date/timestamp/string. A week is considered to start on a Monday and week 1 is the first week with more than 3 days, as defined by ISO 8601 An integer, or null if the input was a string that could not be cast to a date 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.723Z
(weekofyear expr)
Params: (e: Column)
Result: Column
Extracts the week number as an integer from a given date/timestamp/string.
A week is considered to start on a Monday and week 1 is the first week with more than 3 days, as defined by ISO 8601
An integer, or null if the input was a string that could not be cast to a date
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.723Z
Params: (e: Column) Result: Column Extracts the week number as an integer from a given date/timestamp/string. A week is considered to start on a Monday and week 1 is the first week with more than 3 days, as defined by ISO 8601 An integer, or null if the input was a string that could not be cast to a date 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.723Z
(when condition if-expr)
(when condition if-expr else-expr)
Params: (condition: Column, value: Any)
Result: Column
Evaluates a list of conditions and returns one of multiple possible result expressions. If otherwise is not defined at the end, null is returned for unmatched conditions.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.724Z
Params: (condition: Column, value: Any) Result: Column Evaluates a list of conditions and returns one of multiple possible result expressions. If otherwise is not defined at the end, null is returned for unmatched conditions. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.724Z
(window time-expr duration)
(window time-expr duration slide)
(window time-expr duration slide start)
Params: (timeColumn: Column, windowDuration: String, slideDuration: String, startTime: String)
Result: Column
Bucketize rows into one or more time windows given a timestamp specifying column. Window starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window [12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in the order of months are not supported. The following example takes the average stock price for a one minute window every 10 seconds starting 5 seconds after the hour:
The windows will look like:
For a streaming query, you may use the function current_timestamp to generate windows on processing time.
The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType.
A string specifying the width of the window, e.g. 10 minutes, 1 second. Check org.apache.spark.unsafe.types.CalendarInterval for valid duration identifiers. Note that the duration is a fixed length of time, and does not vary over time according to a calendar. For example, 1 day always means 86,400,000 milliseconds, not a calendar day.
A string specifying the sliding interval of the window, e.g. 1 minute. A new window will be generated every slideDuration. Must be less than or equal to the windowDuration. Check org.apache.spark.unsafe.types.CalendarInterval for valid duration identifiers. This duration is likewise absolute, and does not vary according to a calendar.
The offset with respect to 1970-01-01 00:00:00 UTC with which to start window intervals. For example, in order to have hourly tumbling windows that start 15 minutes past the hour, e.g. 12:15-13:15, 13:15-14:15... provide startTime as 15 minutes.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.732Z
Params: (timeColumn: Column, windowDuration: String, slideDuration: String, startTime: String) Result: Column Bucketize rows into one or more time windows given a timestamp specifying column. Window starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window [12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in the order of months are not supported. The following example takes the average stock price for a one minute window every 10 seconds starting 5 seconds after the hour: The windows will look like: For a streaming query, you may use the function current_timestamp to generate windows on processing time. The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType. A string specifying the width of the window, e.g. 10 minutes, 1 second. Check org.apache.spark.unsafe.types.CalendarInterval for valid duration identifiers. Note that the duration is a fixed length of time, and does not vary over time according to a calendar. For example, 1 day always means 86,400,000 milliseconds, not a calendar day. A string specifying the sliding interval of the window, e.g. 1 minute. A new window will be generated every slideDuration. Must be less than or equal to the windowDuration. Check org.apache.spark.unsafe.types.CalendarInterval for valid duration identifiers. This duration is likewise absolute, and does not vary according to a calendar. The offset with respect to 1970-01-01 00:00:00 UTC with which to start window intervals. For example, in order to have hourly tumbling windows that start 15 minutes past the hour, e.g. 12:15-13:15, 13:15-14:15... provide startTime as 15 minutes. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.732Z
(xxhash-64 & exprs)
Params: (cols: Column*)
Result: Column
Calculates the hash code of given columns using the 64-bit variant of the xxHash algorithm, and returns the result as a long column.
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.733Z
Params: (cols: Column*) Result: Column Calculates the hash code of given columns using the 64-bit variant of the xxHash algorithm, and returns the result as a long column. 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.733Z
(xxhash64 & exprs)
Params: (cols: Column*)
Result: Column
Calculates the hash code of given columns using the 64-bit variant of the xxHash algorithm, and returns the result as a long column.
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.733Z
Params: (cols: Column*) Result: Column Calculates the hash code of given columns using the 64-bit variant of the xxHash algorithm, and returns the result as a long column. 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.733Z
(year expr)
Params: (e: Column)
Result: Column
Extracts the year as an integer from a given date/timestamp/string.
An integer, or null if the input was a string that could not be cast to a date
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.734Z
Params: (e: Column) Result: Column Extracts the year as an integer from a given date/timestamp/string. An integer, or null if the input was a string that could not be cast to a date 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.734Z
(zip-with left right merge-fn)
Params: (left: Column, right: Column, f: (Column, Column) ⇒ Column)
Result: Column
Merge two given arrays, element-wise, into a single array using a function. If one array is shorter, nulls are appended at the end to match the length of the longer array, before applying the function.
the left input array column
the right input array column
(lCol, rCol) => col, the lambda function to merge two input columns into one column
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.737Z
Params: (left: Column, right: Column, f: (Column, Column) ⇒ Column) Result: Column Merge two given arrays, element-wise, into a single array using a function. If one array is shorter, nulls are appended at the end to match the length of the longer array, before applying the function. the left input array column the right input array column (lCol, rCol) => col, the lambda function to merge two input columns into one column 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.737Z
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close