(! expr)
Params: (e: Column)
Result: Column
Inversion of boolean expression, i.e. NOT.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.497Z
Params: (e: Column) Result: Column Inversion of boolean expression, i.e. NOT. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.497Z
(% left-expr right-expr)
Params: (other: Any)
Result: Column
Modulo (a.k.a. remainder) expression.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.822Z
Params: (other: Any) Result: Column Modulo (a.k.a. remainder) expression. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.822Z
(& left-expr right-expr)
Params: (other: Any)
Result: Column
Compute bitwise AND of this expression with another expression.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.878Z
Params: (other: Any) Result: Column Compute bitwise AND of this expression with another expression. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.878Z
(&& & exprs)
Params: (other: Any)
Result: Column
Boolean AND.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.824Z
Params: (other: Any) Result: Column Boolean AND. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.824Z
(* & exprs)
Params: (other: Any)
Result: Column
Multiplication of this expression and another expression.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.827Z
Params: (other: Any) Result: Column Multiplication of this expression and another expression. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.827Z
(** base exponent)
Params: (l: Column, r: Column)
Result: Column
Returns the value of the first argument raised to the power of the second argument.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.520Z
Params: (l: Column, r: Column) Result: Column Returns the value of the first argument raised to the power of the second argument. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.520Z
(+ & exprs)
Params: (other: Any)
Result: Column
Sum of this expression and another expression.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.829Z
Params: (other: Any) Result: Column Sum of this expression and another expression. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.829Z
(- & exprs)
Params: (other: Any)
Result: Column
Subtraction. Subtract the other expression from this expression.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.957Z
Params: (other: Any) Result: Column Subtraction. Subtract the other expression from this expression. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.957Z
(->col-array args)
Coerce a coll of coerceable values into a coll of columns.
Coerce a coll of coerceable values into a coll of columns.
Params: (colName: String)
Result: Column
Returns a Column based on the given column name.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.258Z
Params: (colName: String) Result: Column Returns a Column based on the given column name. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.258Z
Create a Dataset from a path or a collection of records.
Create a Dataset from a path or a collection of records.
(->date-col expr)
(->date-col expr date-format)
Params: (e: Column)
Result: Column
Converts the column into DateType by casting rules to DateType.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.616Z
Params: (e: Column) Result: Column Converts the column into DateType by casting rules to DateType. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.616Z
Coerce to string useful for debugging.
Coerce to string useful for debugging.
(->kebab-columns dataset)
Returns a new Dataset with all columns renamed to kebab cases.
Returns a new Dataset with all columns renamed to kebab cases.
(->schema value)
Coerces plain Clojure data structures to a Spark schema.
(-> {:x [:short]
:y [:string :int]
:z {:a :float :b :double}}
g/->schema
g/->string)
=> StructType(
StructField(x,ArrayType(ShortType,true),true),
StructField(y,MapType(StringType,IntegerType,true),true),
StructField(
z,
StructType(
StructField(a,FloatType,true),
StructField(b,DoubleType,true)
),
true
)
)
Coerces plain Clojure data structures to a Spark schema. ```clojure (-> {:x [:short] :y [:string :int] :z {:a :float :b :double}} g/->schema g/->string) => StructType( StructField(x,ArrayType(ShortType,true),true), StructField(y,MapType(StringType,IntegerType,true),true), StructField( z, StructType( StructField(a,FloatType,true), StructField(b,DoubleType,true) ), true ) ) ```
(->timestamp-col expr)
(->timestamp-col expr date-format)
Params: (s: Column)
Result: Column
Converts to a timestamp by casting rules to TimestampType.
A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
A timestamp, or null if the input was a string that could not be cast to a timestamp
2.2.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.623Z
Params: (s: Column) Result: Column Converts to a timestamp by casting rules to TimestampType. A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS A timestamp, or null if the input was a string that could not be cast to a timestamp 2.2.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.623Z
(->utc-timestamp expr)
Params: (ts: Column, tz: String)
Result: Column
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'.
A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
A string detailing the time zone ID that the input should be adjusted to. It should be in the format of either region-based zone IDs or zone offsets. Region IDs must have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'. Other short names are not recommended to use because they can be ambiguous.
A timestamp, or null if ts was a string that could not be cast to a timestamp or tz was an invalid value
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.626Z
Params: (ts: Column, tz: String) Result: Column Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'. A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS A string detailing the time zone ID that the input should be adjusted to. It should be in the format of either region-based zone IDs or zone offsets. Region IDs must have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'. Other short names are not recommended to use because they can be ambiguous. A timestamp, or null if ts was a string that could not be cast to a timestamp or tz was an invalid value 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.626Z
(/ & exprs)
Params: (other: Any)
Result: Column
Division this expression by another expression.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.832Z
Params: (other: Any) Result: Column Division this expression by another expression. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.832Z
Params: (other: Any)
Result: Column
Less than.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.834Z
Params: (other: Any) Result: Column Less than. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.834Z
Params: (other: Any)
Result: Column
Less than or equal to.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.836Z
Params: (other: Any) Result: Column Less than or equal to. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.836Z
Params: (other: Any)
Result: Column
Equality test that is safe for null values.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.838Z
Params: (other: Any) Result: Column Equality test that is safe for null values. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.838Z
Params: (other: Any)
Result: Column
Equality test.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.843Z
Params: (other: Any) Result: Column Equality test. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.843Z
Params: (other: Any)
Result: Column
Inequality test.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.840Z
Params: (other: Any) Result: Column Inequality test. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.840Z
Params: (other: Any)
Result: Column
Equality test.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.843Z
Params: (other: Any) Result: Column Equality test. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.843Z
Params: (other: Any)
Result: Column
Greater than.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.845Z
Params: (other: Any) Result: Column Greater than. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.845Z
Params: (other: Any)
Result: Column
Greater than or equal to an expression.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.847Z
Params: (other: Any) Result: Column Greater than or equal to an expression. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.847Z
(abs expr)
Params: (e: Column)
Result: Column
Computes the absolute value of a numeric value.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.169Z
Params: (e: Column) Result: Column Computes the absolute value of a numeric value. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.169Z
(acos expr)
Params: (e: Column)
Result: Column
inverse cosine of e in radians, as if computed by java.lang.Math.acos
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.171Z
Params: (e: Column) Result: Column inverse cosine of e in radians, as if computed by java.lang.Math.acos 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.171Z
(add cms item)
(add cms item cnt)
Params: (item: Any)
Result: Unit
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/CountMinSketch.html
Timestamp: 2020-10-19T01:56:26.095Z
Params: (item: Any) Result: Unit Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/CountMinSketch.html Timestamp: 2020-10-19T01:56:26.095Z
(add-months expr months)
Params: (startDate: Column, numMonths: Int)
Result: Column
Returns the date that is numMonths after startDate.
A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
The number of months to add to startDate, can be negative to subtract months
A date, or null if startDate was a string that could not be cast to a date
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.174Z
Params: (startDate: Column, numMonths: Int) Result: Column Returns the date that is numMonths after startDate. A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS The number of months to add to startDate, can be negative to subtract months A date, or null if startDate was a string that could not be cast to a date 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.174Z
(agg dataframe & args)
Params: (aggExpr: (String, String), aggExprs: (String, String)*)
Result: DataFrame
(Scala-specific) Aggregates on the entire Dataset without groups.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.739Z
Params: (aggExpr: (String, String), aggExprs: (String, String)*) Result: DataFrame (Scala-specific) Aggregates on the entire Dataset without groups. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.739Z
(agg-all dataframe agg-fn)
Aggregates on all columns of the entire Dataset without groups.
Aggregates on all columns of the entire Dataset without groups.
(aggregate expr init merge-fn)
(aggregate expr init merge-fn finish-fn)
Params: (expr: Column, initialValue: Column, merge: (Column, Column) ⇒ Column, finish: (Column) ⇒ Column)
Result: Column
Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state. The final state is converted into the final result by applying a finish function.
the input array column
the initial value
(combined_value, input_value) => combined_value, the merge function to merge an input value to the combined_value
combined_value => final_value, the lambda function to convert the combined value of all inputs to final result
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.177Z
Params: (expr: Column, initialValue: Column, merge: (Column, Column) ⇒ Column, finish: (Column) ⇒ Column) Result: Column Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state. The final state is converted into the final result by applying a finish function. the input array column the initial value (combined_value, input_value) => combined_value, the merge function to merge an input value to the combined_value combined_value => final_value, the lambda function to convert the combined value of all inputs to final result 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.177Z
Column: Gives the column an alias.
Dataset: Returns a new Dataset with an alias set.
Column: Gives the column an alias. Dataset: Returns a new Dataset with an alias set.
(app-name)
(app-name spark)
Params:
Result: String
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html
Timestamp: 2020-10-19T01:56:49.487Z
Params: Result: String Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html Timestamp: 2020-10-19T01:56:49.487Z
(approx-count-distinct expr)
(approx-count-distinct expr rsd)
Params: (e: Column)
Result: Column
(Since version 2.1.0) Use approx_count_distinct
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.742Z
Params: (e: Column) Result: Column (Since version 2.1.0) Use approx_count_distinct 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.742Z
(approx-quantile dataframe col-or-cols probs rel-error)
Params: (col: String, probabilities: Array[Double], relativeError: Double)
Result: Array[Double]
Calculates the approximate quantiles of a numerical column of a DataFrame.
The result of this algorithm has the following deterministic bound: If the DataFrame has N elements and if we request the quantile at probability p up to error err, then the algorithm will return a sample x from the DataFrame so that the exact rank of x is close to (p * N). More precisely,
This method implements a variation of the Greenwald-Khanna algorithm (with some speed optimizations). The algorithm was first present in Space-efficient Online Computation of Quantile Summaries by Greenwald and Khanna.
the name of the numerical column
a list of quantile probabilities Each number must belong to [0, 1]. For example 0 is the minimum, 0.5 is the median, 1 is the maximum.
The relative target precision to achieve (greater than or equal to 0). If set to zero, the exact quantiles are computed, which could be very expensive. Note that values greater than 1 are accepted but give the same result as 1.
the approximate quantiles at the given probabilities
2.0.0
null and NaN values will be removed from the numerical column before calculation. If the dataframe is empty or the column only contains null or NaN, an empty array is returned.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameStatFunctions.html
Timestamp: 2020-10-19T01:56:24.640Z
Params: (col: String, probabilities: Array[Double], relativeError: Double) Result: Array[Double] Calculates the approximate quantiles of a numerical column of a DataFrame. The result of this algorithm has the following deterministic bound: If the DataFrame has N elements and if we request the quantile at probability p up to error err, then the algorithm will return a sample x from the DataFrame so that the *exact* rank of x is close to (p * N). More precisely, This method implements a variation of the Greenwald-Khanna algorithm (with some speed optimizations). The algorithm was first present in Space-efficient Online Computation of Quantile Summaries by Greenwald and Khanna. the name of the numerical column a list of quantile probabilities Each number must belong to [0, 1]. For example 0 is the minimum, 0.5 is the median, 1 is the maximum. The relative target precision to achieve (greater than or equal to 0). If set to zero, the exact quantiles are computed, which could be very expensive. Note that values greater than 1 are accepted but give the same result as 1. the approximate quantiles at the given probabilities 2.0.0 null and NaN values will be removed from the numerical column before calculation. If the dataframe is empty or the column only contains null or NaN, an empty array is returned. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameStatFunctions.html Timestamp: 2020-10-19T01:56:24.640Z
(array & exprs)
Params: (cols: Column*)
Result: Column
Creates a new array column. The input columns must all have the same data type.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.184Z
Params: (cols: Column*) Result: Column Creates a new array column. The input columns must all have the same data type. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.184Z
(array-contains expr value)
Params: (column: Column, value: Any)
Result: Column
Returns null if the array is null, true if the array contains value, and false otherwise.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.185Z
Params: (column: Column, value: Any) Result: Column Returns null if the array is null, true if the array contains value, and false otherwise. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.185Z
(array-distinct expr)
Params: (e: Column)
Result: Column
Removes duplicate values from the array.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.186Z
Params: (e: Column) Result: Column Removes duplicate values from the array. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.186Z
(array-except left right)
Params: (col1: Column, col2: Column)
Result: Column
Returns an array of the elements in the first array but not in the second array, without duplicates. The order of elements in the result is not determined
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.188Z
Params: (col1: Column, col2: Column) Result: Column Returns an array of the elements in the first array but not in the second array, without duplicates. The order of elements in the result is not determined 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.188Z
(array-intersect left right)
Params: (col1: Column, col2: Column)
Result: Column
Returns an array of the elements in the intersection of the given two arrays, without duplicates.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.189Z
Params: (col1: Column, col2: Column) Result: Column Returns an array of the elements in the intersection of the given two arrays, without duplicates. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.189Z
(array-join expr delimiter)
(array-join expr delimiter null-replacement)
Params: (column: Column, delimiter: String, nullReplacement: String)
Result: Column
Concatenates the elements of column using the delimiter. Null values are replaced with nullReplacement.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.194Z
Params: (column: Column, delimiter: String, nullReplacement: String) Result: Column Concatenates the elements of column using the delimiter. Null values are replaced with nullReplacement. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.194Z
(array-max expr)
Params: (e: Column)
Result: Column
Returns the maximum value in the array.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.195Z
Params: (e: Column) Result: Column Returns the maximum value in the array. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.195Z
(array-min expr)
Params: (e: Column)
Result: Column
Returns the minimum value in the array.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.197Z
Params: (e: Column) Result: Column Returns the minimum value in the array. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.197Z
(array-position expr value)
Params: (column: Column, value: Any)
Result: Column
Locates the position of the first occurrence of the value in the given array as long. Returns null if either of the arguments are null.
2.4.0
The position is not zero based, but 1 based index. Returns 0 if value could not be found in array.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.198Z
Params: (column: Column, value: Any) Result: Column Locates the position of the first occurrence of the value in the given array as long. Returns null if either of the arguments are null. 2.4.0 The position is not zero based, but 1 based index. Returns 0 if value could not be found in array. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.198Z
(array-remove expr element)
Params: (column: Column, element: Any)
Result: Column
Remove all elements that equal to element from the given array.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.199Z
Params: (column: Column, element: Any) Result: Column Remove all elements that equal to element from the given array. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.199Z
(array-repeat left right)
Params: (left: Column, right: Column)
Result: Column
Creates an array containing the left argument repeated the number of times given by the right argument.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.201Z
Params: (left: Column, right: Column) Result: Column Creates an array containing the left argument repeated the number of times given by the right argument. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.201Z
(array-sort expr)
Params: (e: Column)
Result: Column
Sorts the input array in ascending order. The elements of the input array must be orderable. Null elements will be placed at the end of the returned array.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.202Z
Params: (e: Column) Result: Column Sorts the input array in ascending order. The elements of the input array must be orderable. Null elements will be placed at the end of the returned array. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.202Z
(array-type val-type nullable)
Creates an ArrayType by specifying the data type of elements val-type
and
whether the array contains null values nullable
.
Creates an ArrayType by specifying the data type of elements `val-type` and whether the array contains null values `nullable`.
(array-union left right)
Params: (col1: Column, col2: Column)
Result: Column
Returns an array of the elements in the union of the given two arrays, without duplicates.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.204Z
Params: (col1: Column, col2: Column) Result: Column Returns an array of the elements in the union of the given two arrays, without duplicates. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.204Z
(arrays-overlap left right)
Params: (a1: Column, a2: Column)
Result: Column
Returns true if a1 and a2 have at least one non-null element in common. If not and both the arrays are non-empty and any of them contains a null, it returns null. It returns false otherwise.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.209Z
Params: (a1: Column, a2: Column) Result: Column Returns true if a1 and a2 have at least one non-null element in common. If not and both the arrays are non-empty and any of them contains a null, it returns null. It returns false otherwise. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.209Z
(arrays-zip & exprs)
Params: (e: Column*)
Result: Column
Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.211Z
Params: (e: Column*) Result: Column Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.211Z
Column: Gives the column an alias.
Dataset: Returns a new Dataset with an alias set.
Column: Gives the column an alias. Dataset: Returns a new Dataset with an alias set.
(asc expr)
Params:
Result: Column
Returns a sort expression based on ascending order of the column.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.867Z
Params: Result: Column Returns a sort expression based on ascending order of the column. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.867Z
(asc-nulls-first expr)
Params:
Result: Column
Returns a sort expression based on ascending order of the column, and null values return before non-null values.
2.1.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.869Z
Params: Result: Column Returns a sort expression based on ascending order of the column, and null values return before non-null values. 2.1.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.869Z
(asc-nulls-last expr)
Params:
Result: Column
Returns a sort expression based on ascending order of the column, and null values appear after non-null values.
2.1.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.870Z
Params: Result: Column Returns a sort expression based on ascending order of the column, and null values appear after non-null values. 2.1.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.870Z
(ascii expr)
Params: (e: Column)
Result: Column
Computes the numeric value of the first character of the string column, and returns the result as an int column.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.216Z
Params: (e: Column) Result: Column Computes the numeric value of the first character of the string column, and returns the result as an int column. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.216Z
(asin expr)
Params: (e: Column)
Result: Column
inverse sine of e in radians, as if computed by java.lang.Math.asin
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.219Z
Params: (e: Column) Result: Column inverse sine of e in radians, as if computed by java.lang.Math.asin 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.219Z
Column: variadic version of map-concat
.
Dataset: variadic version of with-column
.
Column: variadic version of `map-concat`. Dataset: variadic version of `with-column`.
(atan expr)
Params: (e: Column)
Result: Column
inverse tangent of e, as if computed by java.lang.Math.atan
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.221Z
Params: (e: Column) Result: Column inverse tangent of e, as if computed by java.lang.Math.atan 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.221Z
(atan-2 expr-x expr-y)
Params: (y: Column, x: Column)
Result: Column
coordinate on y-axis
coordinate on x-axis
the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by java.lang.Math.atan2
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.233Z
Params: (y: Column, x: Column) Result: Column coordinate on y-axis coordinate on x-axis the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by java.lang.Math.atan2 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.233Z
(atan2 expr-x expr-y)
Params: (y: Column, x: Column)
Result: Column
coordinate on y-axis
coordinate on x-axis
the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by java.lang.Math.atan2
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.233Z
Params: (y: Column, x: Column) Result: Column coordinate on y-axis coordinate on x-axis the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by java.lang.Math.atan2 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.233Z
(base-64 expr)
Params: (e: Column)
Result: Column
Computes the BASE64 encoding of a binary column and returns it as a string column. This is the reverse of unbase64.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.236Z
Params: (e: Column) Result: Column Computes the BASE64 encoding of a binary column and returns it as a string column. This is the reverse of unbase64. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.236Z
(base64 expr)
Params: (e: Column)
Result: Column
Computes the BASE64 encoding of a binary column and returns it as a string column. This is the reverse of unbase64.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.236Z
Params: (e: Column) Result: Column Computes the BASE64 encoding of a binary column and returns it as a string column. This is the reverse of unbase64. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.236Z
(between expr lower-bound upper-bound)
Params: (lowerBound: Any, upperBound: Any)
Result: Column
True if the current column is between the lower bound and upper bound, inclusive.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.872Z
Params: (lowerBound: Any, upperBound: Any) Result: Column True if the current column is between the lower bound and upper bound, inclusive. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.872Z
(bin expr)
Params: (e: Column)
Result: Column
An expression that returns the string representation of the binary value of the given long column. For example, bin("12") returns "1100".
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.238Z
Params: (e: Column) Result: Column An expression that returns the string representation of the binary value of the given long column. For example, bin("12") returns "1100". 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.238Z
Params: (path: String, minPartitions: Int)
Result: JavaPairRDD[String, PortableDataStream]
Read a directory of binary files from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI as a byte array. Each file is read as a single record and returned in a key-value pair, where the key is the path of each file, the value is the content of each file.
For example, if you have the following files:
Do
then rdd contains
A suggestion value of the minimal splitting number for input data.
Small files are preferred; very large files but may cause bad performance.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html
Timestamp: 2020-10-19T01:56:49.492Z
Params: (path: String, minPartitions: Int) Result: JavaPairRDD[String, PortableDataStream] Read a directory of binary files from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI as a byte array. Each file is read as a single record and returned in a key-value pair, where the key is the path of each file, the value is the content of each file. For example, if you have the following files: Do then rdd contains A suggestion value of the minimal splitting number for input data. Small files are preferred; very large files but may cause bad performance. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html Timestamp: 2020-10-19T01:56:49.492Z
(bit-size bloom)
Params: ()
Result: Long
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/BloomFilter.html
Timestamp: 2020-10-19T01:56:25.738Z
Params: () Result: Long Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/BloomFilter.html Timestamp: 2020-10-19T01:56:25.738Z
(bitwise-and left-expr right-expr)
Params: (other: Any)
Result: Column
Compute bitwise AND of this expression with another expression.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.878Z
Params: (other: Any) Result: Column Compute bitwise AND of this expression with another expression. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.878Z
(bitwise-not expr)
Params: (e: Column)
Result: Column
Computes bitwise NOT (~) of a number.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.239Z
Params: (e: Column) Result: Column Computes bitwise NOT (~) of a number. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.239Z
(bitwise-or left-expr right-expr)
Params: (other: Any)
Result: Column
Compute bitwise OR of this expression with another expression.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.879Z
Params: (other: Any) Result: Column Compute bitwise OR of this expression with another expression. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.879Z
(bitwise-xor left-expr right-expr)
Params: (other: Any)
Result: Column
Compute bitwise XOR of this expression with another expression.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.881Z
Params: (other: Any) Result: Column Compute bitwise XOR of this expression with another expression. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.881Z
(bloom-filter dataframe expr expected-num-items num-bits-or-fpp)
Params: (colName: String, expectedNumItems: Long, fpp: Double)
Result: BloomFilter
Builds a Bloom filter over a specified column.
name of the column over which the filter is built
expected number of items which will be put into the filter.
expected false positive probability of the filter.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameStatFunctions.html
Timestamp: 2020-10-19T01:56:24.647Z
Params: (colName: String, expectedNumItems: Long, fpp: Double) Result: BloomFilter Builds a Bloom filter over a specified column. name of the column over which the filter is built expected number of items which will be put into the filter. expected false positive probability of the filter. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameStatFunctions.html Timestamp: 2020-10-19T01:56:24.647Z
(broadcast dataframe)
Params: (df: Dataset[T])
Result: Dataset[T]
Marks a DataFrame as small enough for use in broadcast joins.
The following example marks the right DataFrame for broadcast hash join using joinKey.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.240Z
Params: (df: Dataset[T]) Result: Dataset[T] Marks a DataFrame as small enough for use in broadcast joins. The following example marks the right DataFrame for broadcast hash join using joinKey. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.240Z
(bround expr)
Params: (e: Column)
Result: Column
Returns the value of the column e rounded to 0 decimal places with HALF_EVEN round mode.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.243Z
Params: (e: Column) Result: Column Returns the value of the column e rounded to 0 decimal places with HALF_EVEN round mode. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.243Z
(cache dataframe)
Params: ()
Result: Dataset.this.type
Persist this Dataset with the default storage level (MEMORY_AND_DISK).
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.750Z
Params: () Result: Dataset.this.type Persist this Dataset with the default storage level (MEMORY_AND_DISK). 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.750Z
(case expr & clauses)
Returns a new Column imitating Clojure's case
macro behaviour.
Returns a new Column imitating Clojure's `case` macro behaviour.
(cast expr new-type)
Params: (to: DataType)
Result: Column
Casts the column to a different data type.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.885Z
Params: (to: DataType) Result: Column Casts the column to a different data type. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.885Z
(cbrt expr)
Params: (e: Column)
Result: Column
Computes the cube-root of the given value.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.253Z
Params: (e: Column) Result: Column Computes the cube-root of the given value. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.253Z
(ceil expr)
Params: (e: Column)
Result: Column
Computes the ceiling of the given value.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.255Z
Params: (e: Column) Result: Column Computes the ceiling of the given value. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.255Z
(checkpoint dataframe)
(checkpoint dataframe eager)
Params: ()
Result: Dataset[T]
Eagerly checkpoint a Dataset and return the new Dataset. Checkpointing can be used to truncate the logical plan of this Dataset, which is especially useful in iterative algorithms where the plan may grow exponentially. It will be saved to files inside the checkpoint directory set with SparkContext#setCheckpointDir.
2.1.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.752Z
Params: () Result: Dataset[T] Eagerly checkpoint a Dataset and return the new Dataset. Checkpointing can be used to truncate the logical plan of this Dataset, which is especially useful in iterative algorithms where the plan may grow exponentially. It will be saved to files inside the checkpoint directory set with SparkContext#setCheckpointDir. 2.1.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.752Z
(checkpoint-dir)
(checkpoint-dir spark)
Params:
Result: Optional[String]
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html
Timestamp: 2020-10-19T01:56:49.509Z
Params: Result: Optional[String] Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html Timestamp: 2020-10-19T01:56:49.509Z
(clip expr low high)
Returns a new Column where values outside [low, high]
are clipped to the interval edges.
Returns a new Column where values outside `[low, high]` are clipped to the interval edges.
Column: Returns the first column that is not null, or null if all inputs are null.
Dataset: Returns a new Dataset that has exactly numPartitions partitions, when the fewer partitions are requested.
Column: Returns the first column that is not null, or null if all inputs are null. Dataset: Returns a new Dataset that has exactly numPartitions partitions, when the fewer partitions are requested.
Params: (colName: String)
Result: Column
Returns a Column based on the given column name.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.258Z
Params: (colName: String) Result: Column Returns a Column based on the given column name. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.258Z
(col-regex dataframe col-name)
Params: (colName: String)
Result: Column
Selects column based on the column name specified as a regex and returns it as Column.
2.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.758Z
Params: (colName: String) Result: Column Selects column based on the column name specified as a regex and returns it as Column. 2.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.758Z
(collect dataframe)
Params: ()
Result: Array[T]
Returns an array that contains all rows in this Dataset.
Running collect requires moving all the data into the application's driver process, and doing so on a very large dataset can crash the driver process with OutOfMemoryError.
For Java API, use collectAsList.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.759Z
Params: () Result: Array[T] Returns an array that contains all rows in this Dataset. Running collect requires moving all the data into the application's driver process, and doing so on a very large dataset can crash the driver process with OutOfMemoryError. For Java API, use collectAsList. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.759Z
(collect-col dataframe col-name)
Returns a vector that contains all rows in the column of the Dataset.
Returns a vector that contains all rows in the column of the Dataset.
(collect-list expr)
Params: (e: Column)
Result: Column
Aggregate function: returns a list of objects with duplicates.
1.6.0
The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.261Z
Params: (e: Column) Result: Column Aggregate function: returns a list of objects with duplicates. 1.6.0 The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.261Z
(collect-set expr)
Params: (e: Column)
Result: Column
Aggregate function: returns a set of objects with duplicate elements eliminated.
1.6.0
The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.263Z
Params: (e: Column) Result: Column Aggregate function: returns a set of objects with duplicate elements eliminated. 1.6.0 The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.263Z
(collect-to-arrow rdd chunk-size out-dir)
Collects the dataframe on driver and exports it as arrow files.
The data gets transfered by partition, and so each partions should be small
enough to fit in heap space of the driver. Then the data is saved in chunks
of chunk-size
rows to disk as arrow files.
rdd
Spark dataset
chunk-size
Number of rows each arrow file will have. Should be small
enoungh to make data fit in heap space of driver.
out-dir
Output dir of arrow files
Collects the dataframe on driver and exports it as arrow files. The data gets transfered by partition, and so each partions should be small enough to fit in heap space of the driver. Then the data is saved in chunks of `chunk-size` rows to disk as arrow files. `rdd` Spark dataset `chunk-size` Number of rows each arrow file will have. Should be small enoungh to make data fit in heap space of driver. `out-dir` Output dir of arrow files
(collect-vals dataframe)
Returns the vector values of the Dataset collected.
Returns the vector values of the Dataset collected.
(column-names dataframe)
Returns all column names as an array of strings.
Returns all column names as an array of strings.
(columns dataframe)
Returns all column names as an array of keywords.
Returns all column names as an array of keywords.
(compatible? bloom other)
Params: (other: BloomFilter)
Result: Boolean
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/BloomFilter.html
Timestamp: 2020-10-19T01:56:25.740Z
Params: (other: BloomFilter) Result: Boolean Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/BloomFilter.html Timestamp: 2020-10-19T01:56:25.740Z
(concat & exprs)
Params: (exprs: Column*)
Result: Column
Concatenates multiple input columns together into a single column. The function works with strings, binary and compatible array columns.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.265Z
Params: (exprs: Column*) Result: Column Concatenates multiple input columns together into a single column. The function works with strings, binary and compatible array columns. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.265Z
(concat-ws sep & exprs)
Params: (sep: String, exprs: Column*)
Result: Column
Concatenates multiple input string columns together into a single string column, using the given separator.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.267Z
Params: (sep: String, exprs: Column*) Result: Column Concatenates multiple input string columns together into a single string column, using the given separator. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.267Z
(cond & clauses)
Returns a new Column imitating Clojure's cond
macro behaviour.
Returns a new Column imitating Clojure's `cond` macro behaviour.
(condp pred expr & clauses)
Returns a new Column imitating Clojure's condp
macro behaviour.
Returns a new Column imitating Clojure's `condp` macro behaviour.
(conf)
(conf spark)
Params:
Result: SparkConf
Return a copy of this JavaSparkContext's configuration. The configuration cannot be changed at runtime.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html
Timestamp: 2020-10-19T01:56:49.511Z
Params: Result: SparkConf Return a copy of this JavaSparkContext's configuration. The configuration cannot be changed at runtime. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html Timestamp: 2020-10-19T01:56:49.511Z
(confidence cms)
Params: ()
Result: Double
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/CountMinSketch.html
Timestamp: 2020-10-19T01:56:26.102Z
Params: () Result: Double Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/CountMinSketch.html Timestamp: 2020-10-19T01:56:26.102Z
(contains expr literal)
Params: (other: Any)
Result: Column
Contains the other element. Returns a boolean column based on a string match.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.888Z
Params: (other: Any) Result: Column Contains the other element. Returns a boolean column based on a string match. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.888Z
(conv expr from-base to-base)
Params: (num: Column, fromBase: Int, toBase: Int)
Result: Column
Convert a number in a string column from one base to another.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.268Z
Params: (num: Column, fromBase: Int, toBase: Int) Result: Column Convert a number in a string column from one base to another. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.268Z
Column: Aggregate function: returns the Pearson Correlation Coefficient for two columns.
Datasate: Calculates the Pearson Correlation Coefficient of two columns of a DataFrame.
Column: Aggregate function: returns the Pearson Correlation Coefficient for two columns. Datasate: Calculates the Pearson Correlation Coefficient of two columns of a DataFrame.
(cos expr)
Params: (e: Column)
Result: Column
angle in radians
cosine of the angle, as if computed by java.lang.Math.cos
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.272Z
Params: (e: Column) Result: Column angle in radians cosine of the angle, as if computed by java.lang.Math.cos 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.272Z
(cosh expr)
Params: (e: Column)
Result: Column
hyperbolic angle
hyperbolic cosine of the angle, as if computed by java.lang.Math.cosh
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.275Z
Params: (e: Column) Result: Column hyperbolic angle hyperbolic cosine of the angle, as if computed by java.lang.Math.cosh 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.275Z
Column: Aggregate function: returns the number of items in a group.
Dataset: Returns the number of rows in the Dataset.
RelationalGroupedDataset: Count the number of rows for each group.
Column: Aggregate function: returns the number of items in a group. Dataset: Returns the number of rows in the Dataset. RelationalGroupedDataset: Count the number of rows for each group.
(count-distinct & exprs)
Params: (expr: Column, exprs: Column*)
Result: Column
Aggregate function: returns the number of distinct items in a group.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.279Z
Params: (expr: Column, exprs: Column*) Result: Column Aggregate function: returns the number of distinct items in a group. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.279Z
(count-min-sketch dataframe expr eps-or-depth confidence-or-width seed)
Params: (colName: String, depth: Int, width: Int, seed: Int)
Result: CountMinSketch
Builds a Count-min Sketch over a specified column.
name of the column over which the sketch is built
depth of the sketch
width of the sketch
random seed
a CountMinSketch over column colName
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameStatFunctions.html
Timestamp: 2020-10-19T01:56:24.659Z
Params: (colName: String, depth: Int, width: Int, seed: Int) Result: CountMinSketch Builds a Count-min Sketch over a specified column. name of the column over which the sketch is built depth of the sketch width of the sketch random seed a CountMinSketch over column colName 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameStatFunctions.html Timestamp: 2020-10-19T01:56:24.659Z
(cov dataframe col-name1 col-name2)
Params: (col1: String, col2: String)
Result: Double
Calculate the sample covariance of two numerical columns of a DataFrame.
the name of the first column
the name of the second column
the covariance of the two columns.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameStatFunctions.html
Timestamp: 2020-10-19T01:56:24.661Z
Params: (col1: String, col2: String) Result: Double Calculate the sample covariance of two numerical columns of a DataFrame. the name of the first column the name of the second column the covariance of the two columns. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameStatFunctions.html Timestamp: 2020-10-19T01:56:24.661Z
(covar l-expr r-expr)
Params: (column1: Column, column2: Column)
Result: Column
Aggregate function: returns the sample covariance for two columns.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.284Z
Params: (column1: Column, column2: Column) Result: Column Aggregate function: returns the sample covariance for two columns. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.284Z
(covar-pop l-expr r-expr)
Params: (column1: Column, column2: Column)
Result: Column
Aggregate function: returns the population covariance for two columns.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.282Z
Params: (column1: Column, column2: Column) Result: Column Aggregate function: returns the population covariance for two columns. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.282Z
(covar-samp l-expr r-expr)
Params: (column1: Column, column2: Column)
Result: Column
Aggregate function: returns the sample covariance for two columns.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.284Z
Params: (column1: Column, column2: Column) Result: Column Aggregate function: returns the sample covariance for two columns. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.284Z
(crc-32 expr)
Params: (e: Column)
Result: Column
Calculates the cyclic redundancy check value (CRC32) of a binary column and returns the value as a bigint.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.285Z
Params: (e: Column) Result: Column Calculates the cyclic redundancy check value (CRC32) of a binary column and returns the value as a bigint. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.285Z
(crc32 expr)
Params: (e: Column)
Result: Column
Calculates the cyclic redundancy check value (CRC32) of a binary column and returns the value as a bigint.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.285Z
Params: (e: Column) Result: Column Calculates the cyclic redundancy check value (CRC32) of a binary column and returns the value as a bigint. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.285Z
(create-dataframe rows schema)
(create-dataframe spark rows schema)
Params: (rdd: RDD[A])
(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[A])
Result: DataFrame
Creates a DataFrame from an RDD of Product (e.g. case classes, tuples).
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/SparkSession.html
Timestamp: 2020-10-19T01:56:50.125Z
Params: (rdd: RDD[A]) (implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[A]) Result: DataFrame Creates a DataFrame from an RDD of Product (e.g. case classes, tuples). 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/SparkSession.html Timestamp: 2020-10-19T01:56:50.125Z
(create-global-temp-view! dataframe view-name)
Creates a global temporary view using the given name.
Global temporary view is cross-session. Its lifetime is the lifetime of the Spark application,
i.e. it will be automatically dropped when the application terminates. It's tied to a system
preserved database global_temp
, and we must use the qualified name to refer a global temp
view, e.g. SELECT * FROM global_temp.view1
.
Creates a global temporary view using the given name. Global temporary view is cross-session. Its lifetime is the lifetime of the Spark application, i.e. it will be automatically dropped when the application terminates. It's tied to a system preserved database `global_temp`, and we must use the qualified name to refer a global temp view, e.g. `SELECT * FROM global_temp.view1`.
(create-or-replace-global-temp-view! dataframe view-name)
Creates or replaces a global temporary view using the given name.
Global temporary view is cross-session. Its lifetime is the lifetime of the Spark application,
i.e. it will be automatically dropped when the application terminates. It's tied to a system
preserved database global_temp
, and we must use the qualified name to refer a global temp
view, e.g. SELECT * FROM global_temp.view1
.
Creates or replaces a global temporary view using the given name. Global temporary view is cross-session. Its lifetime is the lifetime of the Spark application, i.e. it will be automatically dropped when the application terminates. It's tied to a system preserved database `global_temp`, and we must use the qualified name to refer a global temp view, e.g. `SELECT * FROM global_temp.view1`.
(create-or-replace-temp-view! dataframe view-name)
Creates or replaces a local temporary view using the given name.
The lifetime of this temporary view is tied to the SparkSession
that was used to create this Dataset.
Creates or replaces a local temporary view using the given name. The lifetime of this temporary view is tied to the `SparkSession` that was used to create this Dataset.
(create-spark-session
{:keys [app-name master configs log-level checkpoint-dir]
:or {app-name "Geni App" master "local[*]" configs {} log-level "WARN"}})
The entry point to programming Spark with the Dataset and DataFrame API.
The entry point to programming Spark with the Dataset and DataFrame API.
(create-temp-view! dataframe view-name)
Creates a local temporary view using the given name.
Local temporary view is session-scoped. Its lifetime is the lifetime of the session that
created it, i.e. it will be automatically dropped when the session terminates. It's not tied
to any databases, i.e. we can't use db1.view1
to reference a local temporary view.
Creates a local temporary view using the given name. Local temporary view is session-scoped. Its lifetime is the lifetime of the session that created it, i.e. it will be automatically dropped when the session terminates. It's not tied to any databases, i.e. we can't use `db1.view1` to reference a local temporary view.
(cross-join left right)
Params: (right: Dataset[_])
Result: DataFrame
Explicit cartesian join with another DataFrame.
Right side of the join operation.
2.1.0
Cartesian joins are very expensive without an extra filter that can be pushed down.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.770Z
Params: (right: Dataset[_]) Result: DataFrame Explicit cartesian join with another DataFrame. Right side of the join operation. 2.1.0 Cartesian joins are very expensive without an extra filter that can be pushed down. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.770Z
(crosstab dataframe col-name1 col-name2)
Params: (col1: String, col2: String)
Result: DataFrame
Computes a pair-wise frequency table of the given columns. Also known as a contingency table. The number of distinct values for each column should be less than 1e4. At most 1e6 non-zero pair frequencies will be returned. The first column of each row will be the distinct values of col1 and the column names will be the distinct values of col2. The name of the first column will be col1_col2. Counts will be returned as Longs. Pairs that have no occurrences will have zero as their counts. Null elements will be replaced by "null", and back ticks will be dropped from elements if they exist.
The name of the first column. Distinct items will make the first item of each row.
The name of the second column. Distinct items will make the column names of the DataFrame.
A DataFrame containing for the contingency table.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameStatFunctions.html
Timestamp: 2020-10-19T01:56:24.664Z
Params: (col1: String, col2: String) Result: DataFrame Computes a pair-wise frequency table of the given columns. Also known as a contingency table. The number of distinct values for each column should be less than 1e4. At most 1e6 non-zero pair frequencies will be returned. The first column of each row will be the distinct values of col1 and the column names will be the distinct values of col2. The name of the first column will be col1_col2. Counts will be returned as Longs. Pairs that have no occurrences will have zero as their counts. Null elements will be replaced by "null", and back ticks will be dropped from elements if they exist. The name of the first column. Distinct items will make the first item of each row. The name of the second column. Distinct items will make the column names of the DataFrame. A DataFrame containing for the contingency table. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameStatFunctions.html Timestamp: 2020-10-19T01:56:24.664Z
(cube dataframe & exprs)
Params: (cols: Column*)
Result: RelationalGroupedDataset
Create a multi-dimensional cube for the current Dataset using the specified columns, so we can run aggregation on them. See RelationalGroupedDataset for all the available aggregate functions.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.778Z
Params: (cols: Column*) Result: RelationalGroupedDataset Create a multi-dimensional cube for the current Dataset using the specified columns, so we can run aggregation on them. See RelationalGroupedDataset for all the available aggregate functions. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.778Z
(cube-root expr)
Params: (e: Column)
Result: Column
Computes the cube-root of the given value.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.253Z
Params: (e: Column) Result: Column Computes the cube-root of the given value. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.253Z
(cume-dist)
Params: ()
Result: Column
Window function: returns the cumulative distribution of values within a window partition, i.e. the fraction of rows that are below the current row.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.286Z
Params: () Result: Column Window function: returns the cumulative distribution of values within a window partition, i.e. the fraction of rows that are below the current row. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.286Z
(current-date)
Params: ()
Result: Column
Returns the current date as a date column.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.287Z
Params: () Result: Column Returns the current date as a date column. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.287Z
(current-timestamp)
Params: ()
Result: Column
Returns the current timestamp as a timestamp column.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.288Z
Params: () Result: Column Returns the current timestamp as a timestamp column. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.288Z
(cut expr bins)
Returns a new Column of discretised expr
into the intervals of bins.
Returns a new Column of discretised `expr` into the intervals of bins.
(date-add expr days)
Params: (start: Column, days: Int)
Result: Column
Returns the date that is days days after start
A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
The number of days to add to start, can be negative to subtract days
A date, or null if start was a string that could not be cast to a date
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.295Z
Params: (start: Column, days: Int) Result: Column Returns the date that is days days after start A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS The number of days to add to start, can be negative to subtract days A date, or null if start was a string that could not be cast to a date 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.295Z
(date-diff l-expr r-expr)
Params: (end: Column, start: Column)
Result: Column
Returns the number of days from start to end.
Only considers the date part of the input. For example:
A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
An integer, or null if either end or start were strings that could not be cast to a date. Negative if end is before start
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.304Z
Params: (end: Column, start: Column) Result: Column Returns the number of days from start to end. Only considers the date part of the input. For example: A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS An integer, or null if either end or start were strings that could not be cast to a date. Negative if end is before start 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.304Z
(date-format expr date-fmt)
Params: (dateExpr: Column, format: String)
Result: Column
Converts a date/timestamp/string to a value of string in the format specified by the date format given by the second argument.
See Datetime Patterns for valid date and time format patterns
A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
A pattern dd.MM.yyyy would return a string like 18.03.1993
A string, or null if dateExpr was a string that could not be cast to a timestamp
1.5.0
IllegalArgumentException if the format pattern is invalid
Use specialized functions like year whenever possible as they benefit from a specialized implementation.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.297Z
Params: (dateExpr: Column, format: String) Result: Column Converts a date/timestamp/string to a value of string in the format specified by the date format given by the second argument. See Datetime Patterns for valid date and time format patterns A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS A pattern dd.MM.yyyy would return a string like 18.03.1993 A string, or null if dateExpr was a string that could not be cast to a timestamp 1.5.0 IllegalArgumentException if the format pattern is invalid Use specialized functions like year whenever possible as they benefit from a specialized implementation. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.297Z
(date-sub expr days)
Params: (start: Column, days: Int)
Result: Column
Returns the date that is days days before start
A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
The number of days to subtract from start, can be negative to add days
A date, or null if start was a string that could not be cast to a date
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.300Z
Params: (start: Column, days: Int) Result: Column Returns the date that is days days before start A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS The number of days to subtract from start, can be negative to add days A date, or null if start was a string that could not be cast to a date 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.300Z
(date-trunc fmt expr)
Params: (format: String, timestamp: Column)
Result: Column
Returns timestamp truncated to the unit specified by the format.
For example, date_trunc("year", "2018-11-19 12:01:19") returns 2018-01-01 00:00:00
A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
A timestamp, or null if timestamp was a string that could not be cast to a timestamp or format was an invalid value
2.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.302Z
Params: (format: String, timestamp: Column) Result: Column Returns timestamp truncated to the unit specified by the format. For example, date_trunc("year", "2018-11-19 12:01:19") returns 2018-01-01 00:00:00 A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS A timestamp, or null if timestamp was a string that could not be cast to a timestamp or format was an invalid value 2.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.302Z
(datediff l-expr r-expr)
Params: (end: Column, start: Column)
Result: Column
Returns the number of days from start to end.
Only considers the date part of the input. For example:
A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
An integer, or null if either end or start were strings that could not be cast to a date. Negative if end is before start
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.304Z
Params: (end: Column, start: Column) Result: Column Returns the number of days from start to end. Only considers the date part of the input. For example: A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS An integer, or null if either end or start were strings that could not be cast to a date. Negative if end is before start 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.304Z
(day-of-month expr)
Params: (e: Column)
Result: Column
Extracts the day of the month as an integer from a given date/timestamp/string.
An integer, or null if the input was a string that could not be cast to a date
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.305Z
Params: (e: Column) Result: Column Extracts the day of the month as an integer from a given date/timestamp/string. An integer, or null if the input was a string that could not be cast to a date 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.305Z
(day-of-week expr)
Params: (e: Column)
Result: Column
Extracts the day of the week as an integer from a given date/timestamp/string. Ranges from 1 for a Sunday through to 7 for a Saturday
An integer, or null if the input was a string that could not be cast to a date
2.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.306Z
Params: (e: Column) Result: Column Extracts the day of the week as an integer from a given date/timestamp/string. Ranges from 1 for a Sunday through to 7 for a Saturday An integer, or null if the input was a string that could not be cast to a date 2.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.306Z
(day-of-year expr)
Params: (e: Column)
Result: Column
Extracts the day of the year as an integer from a given date/timestamp/string.
An integer, or null if the input was a string that could not be cast to a date
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.307Z
Params: (e: Column) Result: Column Extracts the day of the year as an integer from a given date/timestamp/string. An integer, or null if the input was a string that could not be cast to a date 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.307Z
(dayofmonth expr)
Params: (e: Column)
Result: Column
Extracts the day of the month as an integer from a given date/timestamp/string.
An integer, or null if the input was a string that could not be cast to a date
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.305Z
Params: (e: Column) Result: Column Extracts the day of the month as an integer from a given date/timestamp/string. An integer, or null if the input was a string that could not be cast to a date 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.305Z
(dayofweek expr)
Params: (e: Column)
Result: Column
Extracts the day of the week as an integer from a given date/timestamp/string. Ranges from 1 for a Sunday through to 7 for a Saturday
An integer, or null if the input was a string that could not be cast to a date
2.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.306Z
Params: (e: Column) Result: Column Extracts the day of the week as an integer from a given date/timestamp/string. Ranges from 1 for a Sunday through to 7 for a Saturday An integer, or null if the input was a string that could not be cast to a date 2.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.306Z
(dayofyear expr)
Params: (e: Column)
Result: Column
Extracts the day of the year as an integer from a given date/timestamp/string.
An integer, or null if the input was a string that could not be cast to a date
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.307Z
Params: (e: Column) Result: Column Extracts the day of the year as an integer from a given date/timestamp/string. An integer, or null if the input was a string that could not be cast to a date 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.307Z
(dec expr)
Returns an expression one less than expr
.
Returns an expression one less than `expr`.
(decode expr charset)
Params: (value: Column, charset: String)
Result: Column
Computes the first argument into a string from a binary using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.309Z
Params: (value: Column, charset: String) Result: Column Computes the first argument into a string from a binary using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.309Z
(default-min-partitions)
(default-min-partitions spark)
Params:
Result: Integer
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html
Timestamp: 2020-10-19T01:56:49.503Z
Params: Result: Integer Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html Timestamp: 2020-10-19T01:56:49.503Z
(default-parallelism)
(default-parallelism spark)
Params:
Result: Integer
Default level of parallelism to use when not given by user (e.g. parallelize and makeRDD).
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html
Timestamp: 2020-10-19T01:56:49.504Z
Params: Result: Integer Default level of parallelism to use when not given by user (e.g. parallelize and makeRDD). Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html Timestamp: 2020-10-19T01:56:49.504Z
(degrees expr)
Params: (e: Column)
Result: Column
Converts an angle measured in radians to an approximately equivalent angle measured in degrees.
angle in radians
angle in degrees, as if computed by java.lang.Math.toDegrees
2.1.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.312Z
Params: (e: Column) Result: Column Converts an angle measured in radians to an approximately equivalent angle measured in degrees. angle in radians angle in degrees, as if computed by java.lang.Math.toDegrees 2.1.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.312Z
(dense & values)
Params: (firstValue: Double, otherValues: Double*)
Result: Vector
Creates a dense vector from its values.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/ml/linalg/Vectors$.html
Timestamp: 2020-10-19T01:56:35.334Z
Params: (firstValue: Double, otherValues: Double*) Result: Vector Creates a dense vector from its values. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/ml/linalg/Vectors$.html Timestamp: 2020-10-19T01:56:35.334Z
(dense-rank)
Params: ()
Result: Column
Window function: returns the rank of rows within a window partition, without any gaps.
The difference between rank and dense_rank is that denseRank leaves no gaps in ranking sequence when there are ties. That is, if you were ranking a competition using dense_rank and had three people tie for second place, you would say that all three were in second place and that the next person came in third. Rank would give me sequential numbers, making the person that came in third place (after the ties) would register as coming in fifth.
This is equivalent to the DENSE_RANK function in SQL.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.313Z
Params: () Result: Column Window function: returns the rank of rows within a window partition, without any gaps. The difference between rank and dense_rank is that denseRank leaves no gaps in ranking sequence when there are ties. That is, if you were ranking a competition using dense_rank and had three people tie for second place, you would say that all three were in second place and that the next person came in third. Rank would give me sequential numbers, making the person that came in third place (after the ties) would register as coming in fifth. This is equivalent to the DENSE_RANK function in SQL. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.313Z
(depth cms)
Params: ()
Result: Int
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/CountMinSketch.html
Timestamp: 2020-10-19T01:56:26.103Z
Params: () Result: Int Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/CountMinSketch.html Timestamp: 2020-10-19T01:56:26.103Z
(desc expr)
Params:
Result: Column
Returns a sort expression based on the descending order of the column.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.890Z
Params: Result: Column Returns a sort expression based on the descending order of the column. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.890Z
(desc-nulls-first expr)
Params:
Result: Column
Returns a sort expression based on the descending order of the column, and null values appear before non-null values.
2.1.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.891Z
Params: Result: Column Returns a sort expression based on the descending order of the column, and null values appear before non-null values. 2.1.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.891Z
(desc-nulls-last expr)
Params:
Result: Column
Returns a sort expression based on the descending order of the column, and null values appear after non-null values.
2.1.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.893Z
Params: Result: Column Returns a sort expression based on the descending order of the column, and null values appear after non-null values. 2.1.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.893Z
(describe dataframe & col-names)
Params: (cols: String*)
Result: DataFrame
Computes basic statistics for numeric and string columns, including count, mean, stddev, min, and max. If no columns are given, this function computes statistics for all numerical or string columns.
This function is meant for exploratory data analysis, as we make no guarantee about the backward compatibility of the schema of the resulting Dataset. If you want to programmatically compute summary statistics, use the agg function instead.
Use summary for expanded statistics and control over which statistics to compute.
Columns to compute statistics on.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.780Z
Params: (cols: String*) Result: DataFrame Computes basic statistics for numeric and string columns, including count, mean, stddev, min, and max. If no columns are given, this function computes statistics for all numerical or string columns. This function is meant for exploratory data analysis, as we make no guarantee about the backward compatibility of the schema of the resulting Dataset. If you want to programmatically compute summary statistics, use the agg function instead. Use summary for expanded statistics and control over which statistics to compute. Columns to compute statistics on. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.780Z
Flag for controlling the storage of an RDD.
DataFrame is stored only on disk and the CPU computation time is high as I/O involved.
Flag for controlling the storage of an RDD. DataFrame is stored only on disk and the CPU computation time is high as I/O involved.
Flag for controlling the storage of an RDD.
Same as disk-only storage level but replicate each partition to two cluster nodes.
Flag for controlling the storage of an RDD. Same as disk-only storage level but replicate each partition to two cluster nodes.
Column: Returns a map whose key is not in ks
.
Dataset: variadic version of drop
.
Column: Returns a map whose key is not in `ks`. Dataset: variadic version of `drop`.
(distinct dataframe)
Params: ()
Result: Dataset[T]
Returns a new Dataset that contains only the unique rows from this Dataset. This is an alias for dropDuplicates.
2.0.0
Equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom equals function defined on T.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.781Z
Params: () Result: Dataset[T] Returns a new Dataset that contains only the unique rows from this Dataset. This is an alias for dropDuplicates. 2.0.0 Equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom equals function defined on T. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.781Z
(drop dataframe & col-names)
Params: (colName: String)
Result: DataFrame
Returns a new Dataset with a column dropped. This is a no-op if schema doesn't contain column name.
This method can only be used to drop top level columns. the colName string is treated literally without further interpretation.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.785Z
Params: (colName: String) Result: DataFrame Returns a new Dataset with a column dropped. This is a no-op if schema doesn't contain column name. This method can only be used to drop top level columns. the colName string is treated literally without further interpretation. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.785Z
(drop-duplicates dataframe & col-names)
Params: ()
Result: Dataset[T]
Returns a new Dataset that contains only the unique rows from this Dataset. This is an alias for distinct.
For a static batch Dataset, it just drops duplicate rows. For a streaming Dataset, it will keep all data across triggers as intermediate state to drop duplicates rows. You can use withWatermark to limit how late the duplicate data can be and system will accordingly limit the state. In addition, too late data older than watermark will be dropped to avoid any possibility of duplicates.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.791Z
Params: () Result: Dataset[T] Returns a new Dataset that contains only the unique rows from this Dataset. This is an alias for distinct. For a static batch Dataset, it just drops duplicate rows. For a streaming Dataset, it will keep all data across triggers as intermediate state to drop duplicates rows. You can use withWatermark to limit how late the duplicate data can be and system will accordingly limit the state. In addition, too late data older than watermark will be dropped to avoid any possibility of duplicates. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.791Z
(drop-na dataframe)
(drop-na dataframe min-non-nulls-or-cols)
(drop-na dataframe min-non-nulls cols)
Params: ()
Result: DataFrame
Returns a new DataFrame that drops rows containing any null or NaN values.
1.3.1
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameNaFunctions.html
Timestamp: 2020-10-19T01:56:23.886Z
Params: () Result: DataFrame Returns a new DataFrame that drops rows containing any null or NaN values. 1.3.1 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameNaFunctions.html Timestamp: 2020-10-19T01:56:23.886Z
(dtypes dataframe)
Params:
Result: Array[(String, String)]
Returns all column names and their data types as an array.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.792Z
Params: Result: Array[(String, String)] Returns all column names and their data types as an array. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.792Z
(element-at expr value)
Params: (column: Column, value: Any)
Result: Column
Returns element of array at given index in value if column is array. Returns value for the given key in value if column is map.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.318Z
Params: (column: Column, value: Any) Result: Column Returns element of array at given index in value if column is array. Returns value for the given key in value if column is map. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.318Z
(empty? dataframe)
Params:
Result: Boolean
Returns true if the Dataset is empty.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.840Z
Params: Result: Boolean Returns true if the Dataset is empty. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.840Z
(encode expr charset)
Params: (value: Column, charset: String)
Result: Column
Computes the first argument into a binary from a string using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.319Z
Params: (value: Column, charset: String) Result: Column Computes the first argument into a binary from a string using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.319Z
(ends-with expr literal)
Params: (other: Column)
Result: Column
String ends with. Returns a boolean column based on a string match.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.898Z
Params: (other: Column) Result: Column String ends with. Returns a boolean column based on a string match. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.898Z
(estimate-count cms item)
Params: (item: Any)
Result: Long
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/CountMinSketch.html
Timestamp: 2020-10-19T01:56:26.104Z
Params: (item: Any) Result: Long Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/CountMinSketch.html Timestamp: 2020-10-19T01:56:26.104Z
(even? expr)
Returns true if expr
is even, else false.
Returns true if `expr` is even, else false.
(except dataframe other)
Params: (other: Dataset[T])
Result: Dataset[T]
Returns a new Dataset containing rows in this Dataset but not in another Dataset. This is equivalent to EXCEPT DISTINCT in SQL.
2.0.0
Equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom equals function defined on T.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.796Z
Params: (other: Dataset[T]) Result: Dataset[T] Returns a new Dataset containing rows in this Dataset but not in another Dataset. This is equivalent to EXCEPT DISTINCT in SQL. 2.0.0 Equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom equals function defined on T. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.796Z
(except-all dataframe other)
Params: (other: Dataset[T])
Result: Dataset[T]
Returns a new Dataset containing rows in this Dataset but not in another Dataset while preserving the duplicates. This is equivalent to EXCEPT ALL in SQL.
2.4.0
Equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom equals function defined on T. Also as standard in SQL, this function resolves columns by position (not by name).
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.798Z
Params: (other: Dataset[T]) Result: Dataset[T] Returns a new Dataset containing rows in this Dataset but not in another Dataset while preserving the duplicates. This is equivalent to EXCEPT ALL in SQL. 2.4.0 Equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom equals function defined on T. Also as standard in SQL, this function resolves columns by position (not by name). Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.798Z
(exists expr predicate)
Params: (column: Column, f: (Column) ⇒ Column)
Result: Column
Returns whether a predicate holds for one or more elements in the array.
the input array column
col => predicate, the Boolean predicate to check the input column
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.322Z
Params: (column: Column, f: (Column) ⇒ Column) Result: Column Returns whether a predicate holds for one or more elements in the array. the input array column col => predicate, the Boolean predicate to check the input column 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.322Z
(exp expr)
Params: (e: Column)
Result: Column
Computes the exponential of the given value.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.324Z
Params: (e: Column) Result: Column Computes the exponential of the given value. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.324Z
(expected-fpp bloom)
Params: ()
Result: Double
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/BloomFilter.html
Timestamp: 2020-10-19T01:56:25.739Z
Params: () Result: Double Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/BloomFilter.html Timestamp: 2020-10-19T01:56:25.739Z
Column: Prints the expression to the console for debugging purposes.
Dataset: Prints the physical plan to the console for debugging purposes.
Column: Prints the expression to the console for debugging purposes. Dataset: Prints the physical plan to the console for debugging purposes.
(explode expr)
Params: (e: Column)
Result: Column
Creates a new row for each element in the given array or map column. Uses the default column name col for elements in the array and key and value for elements in the map unless specified otherwise.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.325Z
Params: (e: Column) Result: Column Creates a new row for each element in the given array or map column. Uses the default column name col for elements in the array and key and value for elements in the map unless specified otherwise. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.325Z
(expm-1 expr)
Params: (e: Column)
Result: Column
Computes the exponential of the given value minus one.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.329Z
Params: (e: Column) Result: Column Computes the exponential of the given value minus one. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.329Z
(expm1 expr)
Params: (e: Column)
Result: Column
Computes the exponential of the given value minus one.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.329Z
Params: (e: Column) Result: Column Computes the exponential of the given value minus one. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.329Z
(expr s)
Params: (expr: String)
Result: Column
Parses the expression string into the column that it represents, similar to Dataset#selectExpr.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.330Z
Params: (expr: String) Result: Column Parses the expression string into the column that it represents, similar to Dataset#selectExpr. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.330Z
(factorial expr)
Params: (e: Column)
Result: Column
Computes the factorial of the given value.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.331Z
Params: (e: Column) Result: Column Computes the factorial of the given value. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.331Z
(fill-na dataframe value)
(fill-na dataframe value cols)
Params: (value: Long)
Result: DataFrame
Returns a new DataFrame that replaces null or NaN values in numeric columns with value.
2.2.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameNaFunctions.html
Timestamp: 2020-10-19T01:56:23.908Z
Params: (value: Long) Result: DataFrame Returns a new DataFrame that replaces null or NaN values in numeric columns with value. 2.2.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameNaFunctions.html Timestamp: 2020-10-19T01:56:23.908Z
Column: Returns an array of elements for which a predicate holds in a given array.
Dataset: Filters rows using the given condition.
Column: Returns an array of elements for which a predicate holds in a given array. Dataset: Filters rows using the given condition.
Column: Aggregate function: returns the first value of a column in a group.
Dataset: Returns the first row.
Column: Aggregate function: returns the first value of a column in a group. Dataset: Returns the first row.
(first-vals dataframe)
Returns the vector values of the first row in the Dataset collected.
Returns the vector values of the first row in the Dataset collected.
(flatten expr)
Params: (e: Column)
Result: Column
Creates a single array from an array of arrays. If a structure of nested arrays is deeper than two levels, only one level of nesting is removed.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.345Z
Params: (e: Column) Result: Column Creates a single array from an array of arrays. If a structure of nested arrays is deeper than two levels, only one level of nesting is removed. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.345Z
(floor expr)
Params: (e: Column)
Result: Column
Computes the floor of the given value.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.347Z
Params: (e: Column) Result: Column Computes the floor of the given value. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.347Z
(forall expr predicate)
Params: (column: Column, f: (Column) ⇒ Column)
Result: Column
Returns whether a predicate holds for every element in the array.
the input array column
col => predicate, the Boolean predicate to check the input column
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.349Z
Params: (column: Column, f: (Column) ⇒ Column) Result: Column Returns whether a predicate holds for every element in the array. the input array column col => predicate, the Boolean predicate to check the input column 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.349Z
(format-number expr decimal-places)
Params: (x: Column, d: Int)
Result: Column
Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places with HALF_EVEN round mode, and returns the result as a string column.
If d is 0, the result has no decimal point or fractional part. If d is less than 0, the result will be null.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.350Z
Params: (x: Column, d: Int) Result: Column Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places with HALF_EVEN round mode, and returns the result as a string column. If d is 0, the result has no decimal point or fractional part. If d is less than 0, the result will be null. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.350Z
(format-string fmt & exprs)
Params: (format: String, arguments: Column*)
Result: Column
Formats the arguments in printf-style and returns the result as a string column.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.351Z
Params: (format: String, arguments: Column*) Result: Column Formats the arguments in printf-style and returns the result as a string column. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.351Z
(freq-items dataframe col-names)
(freq-items dataframe col-names support)
Params: (cols: Array[String], support: Double)
Result: DataFrame
Finding frequent items for columns, possibly with false positives. Using the frequent element count algorithm described in here, proposed by Karp, Schenker, and Papadimitriou. The support should be greater than 1e-4.
This function is meant for exploratory data analysis, as we make no guarantee about the backward compatibility of the schema of the resulting DataFrame.
the names of the columns to search frequent items in.
The minimum frequency for an item to be considered frequent. Should be greater than 1e-4.
A Local DataFrame with the Array of frequent items for each column.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameStatFunctions.html
Timestamp: 2020-10-19T01:56:24.676Z
Params: (cols: Array[String], support: Double) Result: DataFrame Finding frequent items for columns, possibly with false positives. Using the frequent element count algorithm described in here, proposed by Karp, Schenker, and Papadimitriou. The support should be greater than 1e-4. This function is meant for exploratory data analysis, as we make no guarantee about the backward compatibility of the schema of the resulting DataFrame. the names of the columns to search frequent items in. The minimum frequency for an item to be considered frequent. Should be greater than 1e-4. A Local DataFrame with the Array of frequent items for each column. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameStatFunctions.html Timestamp: 2020-10-19T01:56:24.676Z
(from-csv expr schema)
(from-csv expr schema options)
Params: (e: Column, schema: StructType, options: Map[String, String])
Result: Column
Parses a column containing a CSV string into a StructType with the specified schema. Returns null, in the case of an unparseable string.
a string column containing CSV data.
the schema to use when parsing the CSV string
options to control how the CSV is parsed. accepts the same options and the CSV data source.
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.354Z
Params: (e: Column, schema: StructType, options: Map[String, String]) Result: Column Parses a column containing a CSV string into a StructType with the specified schema. Returns null, in the case of an unparseable string. a string column containing CSV data. the schema to use when parsing the CSV string options to control how the CSV is parsed. accepts the same options and the CSV data source. 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.354Z
(from-json expr schema)
(from-json expr schema options)
Params: (e: Column, schema: StructType, options: Map[String, String])
Result: Column
(Scala-specific) Parses a column containing a JSON string into a StructType with the specified schema. Returns null, in the case of an unparseable string.
a string column containing JSON data.
the schema to use when parsing the json string
options to control how the json is parsed. Accepts the same options as the json data source.
2.1.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.372Z
Params: (e: Column, schema: StructType, options: Map[String, String]) Result: Column (Scala-specific) Parses a column containing a JSON string into a StructType with the specified schema. Returns null, in the case of an unparseable string. a string column containing JSON data. the schema to use when parsing the json string options to control how the json is parsed. Accepts the same options as the json data source. 2.1.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.372Z
(from-unixtime expr)
(from-unixtime expr fmt)
Params: (ut: Column)
Result: Column
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the yyyy-MM-dd HH:mm:ss format.
A number of a type that is castable to a long, such as string or integer. Can be negative for timestamps before the unix epoch
A string, or null if the input was a string that could not be cast to a long
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.375Z
Params: (ut: Column) Result: Column Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the yyyy-MM-dd HH:mm:ss format. A number of a type that is castable to a long, such as string or integer. Can be negative for timestamps before the unix epoch A string, or null if the input was a string that could not be cast to a long 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.375Z
(get-checkpoint-dir)
(get-checkpoint-dir spark)
Params:
Result: Optional[String]
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html
Timestamp: 2020-10-19T01:56:49.509Z
Params: Result: Optional[String] Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html Timestamp: 2020-10-19T01:56:49.509Z
(get-conf)
(get-conf spark)
Params:
Result: SparkConf
Return a copy of this JavaSparkContext's configuration. The configuration cannot be changed at runtime.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html
Timestamp: 2020-10-19T01:56:49.511Z
Params: Result: SparkConf Return a copy of this JavaSparkContext's configuration. The configuration cannot be changed at runtime. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html Timestamp: 2020-10-19T01:56:49.511Z
(get-field expr field-name)
Params: (fieldName: String)
Result: Column
An expression that gets a field by name in a StructType.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.913Z
Params: (fieldName: String) Result: Column An expression that gets a field by name in a StructType. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.913Z
(get-item expr k)
Params: (key: Any)
Result: Column
An expression that gets an item at position ordinal out of an array, or gets a value by key key in a MapType.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.915Z
Params: (key: Any) Result: Column An expression that gets an item at position ordinal out of an array, or gets a value by key key in a MapType. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.915Z
(get-local-property k)
(get-local-property spark k)
Params: (key: String)
Result: String
Get a local property set in this thread, or null if it is missing. See org.apache.spark.api.java.JavaSparkContext.setLocalProperty.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html
Timestamp: 2020-10-19T01:56:49.512Z
Params: (key: String) Result: String Get a local property set in this thread, or null if it is missing. See org.apache.spark.api.java.JavaSparkContext.setLocalProperty. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html Timestamp: 2020-10-19T01:56:49.512Z
(get-persistent-rdds)
(get-persistent-rdds spark)
Params:
Result: Map[Integer, JavaRDD[_]]
Returns a Java map of JavaRDDs that have marked themselves as persistent via cache() call.
This does not necessarily mean the caching or computation was successful.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html
Timestamp: 2020-10-19T01:56:49.513Z
Params: Result: Map[Integer, JavaRDD[_]] Returns a Java map of JavaRDDs that have marked themselves as persistent via cache() call. This does not necessarily mean the caching or computation was successful. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html Timestamp: 2020-10-19T01:56:49.513Z
(get-spark-home)
(get-spark-home spark)
Params: ()
Result: Optional[String]
Get Spark's home location from either a value set through the constructor, or the spark.home Java property, or the SPARK_HOME environment variable (in that order of preference). If neither of these is set, return None.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html
Timestamp: 2020-10-19T01:56:49.518Z
Params: () Result: Optional[String] Get Spark's home location from either a value set through the constructor, or the spark.home Java property, or the SPARK_HOME environment variable (in that order of preference). If neither of these is set, return None. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html Timestamp: 2020-10-19T01:56:49.518Z
(greatest & exprs)
Params: (exprs: Column*)
Result: Column
Returns the greatest value of the list of values, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.382Z
Params: (exprs: Column*) Result: Column Returns the greatest value of the list of values, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.382Z
(group-by dataframe & exprs)
Params: (cols: Column*)
Result: RelationalGroupedDataset
Groups the Dataset using the specified columns, so we can run aggregation on them. See RelationalGroupedDataset for all the available aggregate functions.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.827Z
Params: (cols: Column*) Result: RelationalGroupedDataset Groups the Dataset using the specified columns, so we can run aggregation on them. See RelationalGroupedDataset for all the available aggregate functions. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.827Z
(grouping expr)
Params: (e: Column)
Result: Column
Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated or not, returns 1 for aggregated or 0 for not aggregated in the result set.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.388Z
Params: (e: Column) Result: Column Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated or not, returns 1 for aggregated or 0 for not aggregated in the result set. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.388Z
(grouping-id & exprs)
Params: (cols: Column*)
Result: Column
Aggregate function: returns the level of grouping, equals to
2.0.0
The list of columns should match with grouping columns exactly, or empty (means all the grouping columns).
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.390Z
Params: (cols: Column*) Result: Column Aggregate function: returns the level of grouping, equals to 2.0.0 The list of columns should match with grouping columns exactly, or empty (means all the grouping columns). Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.390Z
(hash & exprs)
Params: (cols: Column*)
Result: Column
Calculates the hash code of given columns, and returns the result as an int column.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.391Z
Params: (cols: Column*) Result: Column Calculates the hash code of given columns, and returns the result as an int column. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.391Z
(hash-code expr)
Params: ()
Result: Int
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.918Z
Params: () Result: Int Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.918Z
(head dataframe)
(head dataframe n-rows)
Params: (n: Int)
Result: Array[T]
Returns the first n rows.
1.6.0
this method should only be used if the resulting array is expected to be small, as all the data is loaded into the driver's memory.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.834Z
Params: (n: Int) Result: Array[T] Returns the first n rows. 1.6.0 this method should only be used if the resulting array is expected to be small, as all the data is loaded into the driver's memory. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.834Z
(head-vals dataframe)
(head-vals dataframe n-rows)
Returns the vector values of the first n rows in the Dataset collected.
Returns the vector values of the first n rows in the Dataset collected.
(hex expr)
Params: (column: Column)
Result: Column
Computes hex value of the given column.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.393Z
Params: (column: Column) Result: Column Computes hex value of the given column. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.393Z
(hint dataframe hint-name & args)
Params: (name: String, parameters: Any*)
Result: Dataset[T]
Specifies some hint on the current Dataset. As an example, the following code specifies that one of the plan can be broadcasted:
2.2.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.835Z
Params: (name: String, parameters: Any*) Result: Dataset[T] Specifies some hint on the current Dataset. As an example, the following code specifies that one of the plan can be broadcasted: 2.2.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.835Z
(hour expr)
Params: (e: Column)
Result: Column
Extracts the hours as an integer from a given date/timestamp/string.
An integer, or null if the input was a string that could not be cast to a date
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.394Z
Params: (e: Column) Result: Column Extracts the hours as an integer from a given date/timestamp/string. An integer, or null if the input was a string that could not be cast to a date 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.394Z
(hypot left-expr right-expr)
Params: (l: Column, r: Column)
Result: Column
Computes sqrt(a2 + b2) without intermediate overflow or underflow.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.406Z
Params: (l: Column, r: Column) Result: Column Computes sqrt(a2 + b2) without intermediate overflow or underflow. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.406Z
(if condition if-expr)
(if condition if-expr else-expr)
Params: (condition: Column, value: Any)
Result: Column
Evaluates a list of conditions and returns one of multiple possible result expressions. If otherwise is not defined at the end, null is returned for unmatched conditions.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.724Z
Params: (condition: Column, value: Any) Result: Column Evaluates a list of conditions and returns one of multiple possible result expressions. If otherwise is not defined at the end, null is returned for unmatched conditions. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.724Z
(inc expr)
Returns an expression one greater than expr
.
Returns an expression one greater than `expr`.
(initcap expr)
Params: (e: Column)
Result: Column
Returns a new string column by converting the first letter of each word to uppercase. Words are delimited by whitespace.
For example, "hello world" will become "Hello World".
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.407Z
Params: (e: Column) Result: Column Returns a new string column by converting the first letter of each word to uppercase. Words are delimited by whitespace. For example, "hello world" will become "Hello World". 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.407Z
(input-file-name)
Params: ()
Result: Column
Creates a string column for the file name of the current Spark task.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.408Z
Params: () Result: Column Creates a string column for the file name of the current Spark task. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.408Z
(input-files dataframe)
Params:
Result: Array[String]
Returns a best-effort snapshot of the files that compose this Dataset. This method simply asks each constituent BaseRelation for its respective files and takes the union of all results. Depending on the source relations, this may not find all input files. Duplicates are removed.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.837Z
Params: Result: Array[String] Returns a best-effort snapshot of the files that compose this Dataset. This method simply asks each constituent BaseRelation for its respective files and takes the union of all results. Depending on the source relations, this may not find all input files. Duplicates are removed. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.837Z
(instr expr substr)
Params: (str: Column, substring: String)
Result: Column
Locate the position of the first occurrence of substr column in the given string. Returns null if either of the arguments are null.
1.5.0
The position is not zero based, but 1 based index. Returns 0 if substr could not be found in str.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.409Z
Params: (str: Column, substring: String) Result: Column Locate the position of the first occurrence of substr column in the given string. Returns null if either of the arguments are null. 1.5.0 The position is not zero based, but 1 based index. Returns 0 if substr could not be found in str. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.409Z
Column: Aggregate function: returns the inter-quartile range of the values in a group.
RelationalGroupedDataset: Compute the inter-quartile range for each numeric columns for each group.
Column: Aggregate function: returns the inter-quartile range of the values in a group. RelationalGroupedDataset: Compute the inter-quartile range for each numeric columns for each group.
(intersect dataframe other)
Params: (other: Dataset[T])
Result: Dataset[T]
Returns a new Dataset containing rows only in both this Dataset and another Dataset. This is equivalent to INTERSECT in SQL.
1.6.0
Equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom equals function defined on T.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.838Z
Params: (other: Dataset[T]) Result: Dataset[T] Returns a new Dataset containing rows only in both this Dataset and another Dataset. This is equivalent to INTERSECT in SQL. 1.6.0 Equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom equals function defined on T. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.838Z
(intersect-all dataframe other)
Params: (other: Dataset[T])
Result: Dataset[T]
Returns a new Dataset containing rows only in both this Dataset and another Dataset while preserving the duplicates. This is equivalent to INTERSECT ALL in SQL.
2.4.0
Equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom equals function defined on T. Also as standard in SQL, this function resolves columns by position (not by name).
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.839Z
Params: (other: Dataset[T]) Result: Dataset[T] Returns a new Dataset containing rows only in both this Dataset and another Dataset while preserving the duplicates. This is equivalent to INTERSECT ALL in SQL. 2.4.0 Equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom equals function defined on T. Also as standard in SQL, this function resolves columns by position (not by name). Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.839Z
Column: Aggregate function: returns the inter-quartile range of the values in a group.
RelationalGroupedDataset: Compute the inter-quartile range for each numeric columns for each group.
Column: Aggregate function: returns the inter-quartile range of the values in a group. RelationalGroupedDataset: Compute the inter-quartile range for each numeric columns for each group.
(is-compatible bloom other)
Params: (other: BloomFilter)
Result: Boolean
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/BloomFilter.html
Timestamp: 2020-10-19T01:56:25.740Z
Params: (other: BloomFilter) Result: Boolean Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/BloomFilter.html Timestamp: 2020-10-19T01:56:25.740Z
(is-empty dataframe)
Params:
Result: Boolean
Returns true if the Dataset is empty.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.840Z
Params: Result: Boolean Returns true if the Dataset is empty. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.840Z
(is-in-collection expr coll)
Params: (values: Iterable[_])
Result: Column
A boolean expression that is evaluated to true if the value of this expression is contained by the provided collection.
Note: Since the type of the elements in the collection are inferred only during the run time, the elements will be "up-casted" to the most common type for comparison. For eg:
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.924Z
Params: (values: Iterable[_]) Result: Column A boolean expression that is evaluated to true if the value of this expression is contained by the provided collection. Note: Since the type of the elements in the collection are inferred only during the run time, the elements will be "up-casted" to the most common type for comparison. For eg: 1) In the case of "Int vs String", the "Int" will be up-casted to "String" and the comparison will look like "String vs String". 2) In the case of "Float vs Double", the "Float" will be up-casted to "Double" and the comparison will look like "Double vs Double" 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.924Z
(is-local dataframe)
Params:
Result: Boolean
Returns true if the collect and take methods can be run locally (without any Spark executors).
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.843Z
Params: Result: Boolean Returns true if the collect and take methods can be run locally (without any Spark executors). 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.843Z
(is-nan expr)
Params:
Result: Column
True if the current expression is NaN.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.927Z
Params: Result: Column True if the current expression is NaN. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.927Z
(is-not-null expr)
Params:
Result: Column
True if the current expression is NOT null.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.932Z
Params: Result: Column True if the current expression is NOT null. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.932Z
(is-null expr)
Params:
Result: Column
True if the current expression is null.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.933Z
Params: Result: Column True if the current expression is null. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.933Z
(is-streaming dataframe)
Params:
Result: Boolean
Returns true if this Dataset contains one or more sources that continuously return data as it arrives. A Dataset that reads data from a streaming source must be executed as a StreamingQuery using the start() method in DataStreamWriter. Methods that return a single answer, e.g. count() or collect(), will throw an AnalysisException when there is a streaming source present.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.844Z
Params: Result: Boolean Returns true if this Dataset contains one or more sources that continuously return data as it arrives. A Dataset that reads data from a streaming source must be executed as a StreamingQuery using the start() method in DataStreamWriter. Methods that return a single answer, e.g. count() or collect(), will throw an AnalysisException when there is a streaming source present. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.844Z
(isin expr coll)
Params: (list: Any*)
Result: Column
A boolean expression that is evaluated to true if the value of this expression is contained by the evaluated values of the arguments.
Note: Since the type of the elements in the list are inferred only during the run time, the elements will be "up-casted" to the most common type for comparison. For eg:
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.936Z
Params: (list: Any*) Result: Column A boolean expression that is evaluated to true if the value of this expression is contained by the evaluated values of the arguments. Note: Since the type of the elements in the list are inferred only during the run time, the elements will be "up-casted" to the most common type for comparison. For eg: 1) In the case of "Int vs String", the "Int" will be up-casted to "String" and the comparison will look like "String vs String". 2) In the case of "Float vs Double", the "Float" will be up-casted to "Double" and the comparison will look like "Double vs Double" 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.936Z
(jars)
(jars spark)
Params:
Result: List[String]
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html
Timestamp: 2020-10-19T01:56:49.532Z
Params: Result: List[String] Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html Timestamp: 2020-10-19T01:56:49.532Z
(java-spark-context spark)
Converts a SparkSession to a JavaSparkContext.
Converts a SparkSession to a JavaSparkContext.
(join left right expr)
(join left right expr join-type)
Params: (right: Dataset[_])
Result: DataFrame
Join with another DataFrame.
Behaves as an INNER JOIN and requires a subsequent join predicate.
Right side of the join operation.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.856Z
Params: (right: Dataset[_]) Result: DataFrame Join with another DataFrame. Behaves as an INNER JOIN and requires a subsequent join predicate. Right side of the join operation. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.856Z
(join-with left right condition)
(join-with left right condition join-type)
Params: (other: Dataset[U], condition: Column, joinType: String)
Result: Dataset[(T, U)]
Joins this Dataset returning a Tuple2 for each pair where condition evaluates to true.
This is similar to the relation join function with one important difference in the result schema. Since joinWith preserves objects present on either side of the join, the result schema is similarly nested into a tuple under the column names _1 and _2.
This type of join can be useful both for preserving type-safety with the original object types as well as working with relational data where either side of the join has column names in common.
Right side of the join.
Join expression.
Type of join to perform. Default inner. Must be one of: inner, cross, outer, full, fullouter,full_outer, left, leftouter, left_outer, right, rightouter, right_outer.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.860Z
Params: (other: Dataset[U], condition: Column, joinType: String) Result: Dataset[(T, U)] Joins this Dataset returning a Tuple2 for each pair where condition evaluates to true. This is similar to the relation join function with one important difference in the result schema. Since joinWith preserves objects present on either side of the join, the result schema is similarly nested into a tuple under the column names _1 and _2. This type of join can be useful both for preserving type-safety with the original object types as well as working with relational data where either side of the join has column names in common. Right side of the join. Join expression. Type of join to perform. Default inner. Must be one of: inner, cross, outer, full, fullouter,full_outer, left, leftouter, left_outer, right, rightouter, right_outer. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.860Z
(keys expr)
Params: (e: Column)
Result: Column
Returns an unordered array containing the keys of the map.
2.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.472Z
Params: (e: Column) Result: Column Returns an unordered array containing the keys of the map. 2.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.472Z
(kurtosis expr)
Params: (e: Column)
Result: Column
Aggregate function: returns the kurtosis of the values in a group.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.416Z
Params: (e: Column) Result: Column Aggregate function: returns the kurtosis of the values in a group. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.416Z
(lag expr offset)
(lag expr offset default)
Params: (e: Column, offset: Int)
Result: Column
Window function: returns the value that is offset rows before the current row, and null if there is less than offset rows before the current row. For example, an offset of one will return the previous row at any given point in the window partition.
This is equivalent to the LAG function in SQL.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.421Z
Params: (e: Column, offset: Int) Result: Column Window function: returns the value that is offset rows before the current row, and null if there is less than offset rows before the current row. For example, an offset of one will return the previous row at any given point in the window partition. This is equivalent to the LAG function in SQL. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.421Z
Column: Aggregate function: returns the last value of the column in a group.
Dataset: Returns the last row.
Column: Aggregate function: returns the last value of the column in a group. Dataset: Returns the last row.
(last-day expr)
Params: (e: Column)
Result: Column
Returns the last day of the month which the given date belongs to. For example, input "2015-07-27" returns "2015-07-31" since July 31 is the last day of the month in July 2015.
A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
A date, or null if the input was a string that could not be cast to a date
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.431Z
Params: (e: Column) Result: Column Returns the last day of the month which the given date belongs to. For example, input "2015-07-27" returns "2015-07-31" since July 31 is the last day of the month in July 2015. A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS A date, or null if the input was a string that could not be cast to a date 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.431Z
(last-vals dataframe)
Returns the vector values of the last row in the Dataset collected.
Returns the vector values of the last row in the Dataset collected.
(lead expr offset)
(lead expr offset default)
Params: (columnName: String, offset: Int)
Result: Column
Window function: returns the value that is offset rows after the current row, and null if there is less than offset rows after the current row. For example, an offset of one will return the next row at any given point in the window partition.
This is equivalent to the LEAD function in SQL.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.437Z
Params: (columnName: String, offset: Int) Result: Column Window function: returns the value that is offset rows after the current row, and null if there is less than offset rows after the current row. For example, an offset of one will return the next row at any given point in the window partition. This is equivalent to the LEAD function in SQL. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.437Z
(least & exprs)
Params: (exprs: Column*)
Result: Column
Returns the least value of the list of values, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.439Z
Params: (exprs: Column*) Result: Column Returns the least value of the list of values, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.439Z
(length expr)
Params: (e: Column)
Result: Column
Computes the character length of a given string or number of bytes of a binary string. The length of character strings include the trailing spaces. The length of binary strings includes binary zeros.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.440Z
Params: (e: Column) Result: Column Computes the character length of a given string or number of bytes of a binary string. The length of character strings include the trailing spaces. The length of binary strings includes binary zeros. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.440Z
(levenshtein left-expr right-expr)
Params: (l: Column, r: Column)
Result: Column
Computes the Levenshtein distance of the two given string columns.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.441Z
Params: (l: Column, r: Column) Result: Column Computes the Levenshtein distance of the two given string columns. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.441Z
(like expr literal)
Params: (literal: String)
Result: Column
SQL like expression. Returns a boolean column based on a SQL LIKE match.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.939Z
Params: (literal: String) Result: Column SQL like expression. Returns a boolean column based on a SQL LIKE match. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.939Z
(limit dataframe n-rows)
Params: (n: Int)
Result: Dataset[T]
Returns a new Dataset by taking the first n rows. The difference between this function and head is that head is an action and returns an array (by triggering query execution) while limit returns a new Dataset.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.861Z
Params: (n: Int) Result: Dataset[T] Returns a new Dataset by taking the first n rows. The difference between this function and head is that head is an action and returns an array (by triggering query execution) while limit returns a new Dataset. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.861Z
(lit arg)
Params: (literal: Any)
Result: Column
Creates a Column of literal value.
The passed in object is returned directly if it is already a Column. If the object is a Scala Symbol, it is converted into a Column also. Otherwise, a new Column is created to represent the literal value.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.442Z
Params: (literal: Any) Result: Column Creates a Column of literal value. The passed in object is returned directly if it is already a Column. If the object is a Scala Symbol, it is converted into a Column also. Otherwise, a new Column is created to represent the literal value. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.442Z
(local? dataframe)
Params:
Result: Boolean
Returns true if the collect and take methods can be run locally (without any Spark executors).
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.843Z
Params: Result: Boolean Returns true if the collect and take methods can be run locally (without any Spark executors). 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.843Z
(locate substr expr)
Params: (substr: String, str: Column)
Result: Column
Locate the position of the first occurrence of substr.
1.5.0
The position is not zero based, but 1 based index. Returns 0 if substr could not be found in str.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.445Z
Params: (substr: String, str: Column) Result: Column Locate the position of the first occurrence of substr. 1.5.0 The position is not zero based, but 1 based index. Returns 0 if substr could not be found in str. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.445Z
(log expr)
Params: (e: Column)
Result: Column
Computes the natural logarithm of the given value.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.449Z
Params: (e: Column) Result: Column Computes the natural logarithm of the given value. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.449Z
(log-10 expr)
Params: (e: Column)
Result: Column
Computes the logarithm of the given value in base 10.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.451Z
Params: (e: Column) Result: Column Computes the logarithm of the given value in base 10. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.451Z
(log-1p expr)
Params: (e: Column)
Result: Column
Computes the natural logarithm of the given value plus one.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.453Z
Params: (e: Column) Result: Column Computes the natural logarithm of the given value plus one. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.453Z
(log-2 expr)
Params: (expr: Column)
Result: Column
Computes the logarithm of the given column in base 2.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.455Z
Params: (expr: Column) Result: Column Computes the logarithm of the given column in base 2. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.455Z
(log10 expr)
Params: (e: Column)
Result: Column
Computes the logarithm of the given value in base 10.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.451Z
Params: (e: Column) Result: Column Computes the logarithm of the given value in base 10. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.451Z
(log1p expr)
Params: (e: Column)
Result: Column
Computes the natural logarithm of the given value plus one.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.453Z
Params: (e: Column) Result: Column Computes the natural logarithm of the given value plus one. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.453Z
(log2 expr)
Params: (expr: Column)
Result: Column
Computes the logarithm of the given column in base 2.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.455Z
Params: (expr: Column) Result: Column Computes the logarithm of the given column in base 2. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.455Z
(lower expr)
Params: (e: Column)
Result: Column
Converts a string column to lower case.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.457Z
Params: (e: Column) Result: Column Converts a string column to lower case. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.457Z
(lpad expr length pad)
Params: (str: Column, len: Int, pad: String)
Result: Column
Left-pad the string column with pad to a length of len. If the string column is longer than len, the return value is shortened to len characters.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.458Z
Params: (str: Column, len: Int, pad: String) Result: Column Left-pad the string column with pad to a length of len. If the string column is longer than len, the return value is shortened to len characters. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.458Z
(ltrim expr)
Params: (e: Column)
Result: Column
Trim the spaces from left end for the specified string value.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.460Z
Params: (e: Column) Result: Column Trim the spaces from left end for the specified string value. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.460Z
(map & exprs)
Params: (cols: Column*)
Result: Column
Creates a new map column. The input columns must be grouped as key-value pairs, e.g. (key1, value1, key2, value2, ...). The key columns must all have the same data type, and can't be null. The value columns must all have the same data type.
2.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.461Z
Params: (cols: Column*) Result: Column Creates a new map column. The input columns must be grouped as key-value pairs, e.g. (key1, value1, key2, value2, ...). The key columns must all have the same data type, and can't be null. The value columns must all have the same data type. 2.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.461Z
(map->dataset map-of-values)
(map->dataset spark map-of-values)
Construct a Dataset from an associative map.
(g/show (g/map->dataset {:a [1 2], :b [3 4]}))
; +---+---+
; |a |b |
; +---+---+
; |1 |3 |
; |2 |4 |
; +---+---+
Construct a Dataset from an associative map. ```clojure (g/show (g/map->dataset {:a [1 2], :b [3 4]})) ; +---+---+ ; |a |b | ; +---+---+ ; |1 |3 | ; |2 |4 | ; +---+---+ ```
(map-concat & exprs)
Params: (cols: Column*)
Result: Column
Returns the union of all the given maps.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.462Z
Params: (cols: Column*) Result: Column Returns the union of all the given maps. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.462Z
(map-entries expr)
Params: (e: Column)
Result: Column
Returns an unordered array of all entries in the given map.
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.463Z
Params: (e: Column) Result: Column Returns an unordered array of all entries in the given map. 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.463Z
(map-filter expr predicate)
Params: (expr: Column, f: (Column, Column) ⇒ Column)
Result: Column
Returns a map whose key-value pairs satisfy a predicate.
the input map column
(key, value) => predicate, the Boolean predicate to filter the input map column
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.465Z
Params: (expr: Column, f: (Column, Column) ⇒ Column) Result: Column Returns a map whose key-value pairs satisfy a predicate. the input map column (key, value) => predicate, the Boolean predicate to filter the input map column 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.465Z
(map-from-arrays key-expr val-expr)
Params: (keys: Column, values: Column)
Result: Column
Creates a new map column. The array in the first column is used for keys. The array in the second column is used for values. All elements in the array for key should not be null.
2.4
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.470Z
Params: (keys: Column, values: Column) Result: Column Creates a new map column. The array in the first column is used for keys. The array in the second column is used for values. All elements in the array for key should not be null. 2.4 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.470Z
(map-from-entries expr)
Params: (e: Column)
Result: Column
Returns a map created from the given array of entries.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.471Z
Params: (e: Column) Result: Column Returns a map created from the given array of entries. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.471Z
(map-keys expr)
Params: (e: Column)
Result: Column
Returns an unordered array containing the keys of the map.
2.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.472Z
Params: (e: Column) Result: Column Returns an unordered array containing the keys of the map. 2.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.472Z
(map-type key-type val-type)
Creates a MapType by specifying the data type of keys key-type
, the data type
of values val-type
, and whether values contain any null value nullable
.
Creates a MapType by specifying the data type of keys `key-type`, the data type of values `val-type`, and whether values contain any null value `nullable`.
(map-values expr)
Params: (e: Column)
Result: Column
Returns an unordered array containing the values of the map.
2.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.473Z
Params: (e: Column) Result: Column Returns an unordered array containing the values of the map. 2.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.473Z
(map-zip-with left right merge-fn)
Params: (left: Column, right: Column, f: (Column, Column, Column) ⇒ Column)
Result: Column
Merge two given maps, key-wise into a single map using a function.
the left input map column
the right input map column
(key, value1, value2) => new_value, the lambda function to merge the map values
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.474Z
Params: (left: Column, right: Column, f: (Column, Column, Column) ⇒ Column) Result: Column Merge two given maps, key-wise into a single map using a function. the left input map column the right input map column (key, value1, value2) => new_value, the lambda function to merge the map values 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.474Z
(master)
(master spark)
Params:
Result: String
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html
Timestamp: 2020-10-19T01:56:49.532Z
Params: Result: String Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html Timestamp: 2020-10-19T01:56:49.532Z
Column: Aggregate function: returns the maximum value of the column in a group.
RelationalGroupedDataset: Compute the max value for each numeric columns for each group.
Column: Aggregate function: returns the maximum value of the column in a group. RelationalGroupedDataset: Compute the max value for each numeric columns for each group.
(md-5 expr)
Params: (e: Column)
Result: Column
Calculates the MD5 digest of a binary column and returns the value as a 32 character hex string.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.478Z
Params: (e: Column) Result: Column Calculates the MD5 digest of a binary column and returns the value as a 32 character hex string. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.478Z
(md5 expr)
Params: (e: Column)
Result: Column
Calculates the MD5 digest of a binary column and returns the value as a 32 character hex string.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.478Z
Params: (e: Column) Result: Column Calculates the MD5 digest of a binary column and returns the value as a 32 character hex string. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.478Z
Column: Aggregate function: returns the average of the values in a group.
RelationalGroupedDataset: Compute the average value for each numeric columns for each group.
Column: Aggregate function: returns the average of the values in a group. RelationalGroupedDataset: Compute the average value for each numeric columns for each group.
Column: Aggregate function: returns the median range of the values in a group.
RelationalGroupedDataset: Compute the median range for each numeric columns for each group.
Column: Aggregate function: returns the median range of the values in a group. RelationalGroupedDataset: Compute the median range for each numeric columns for each group.
Flag for controlling the storage of an RDD.
The default behavior of the DataFrame or Dataset. In this Storage Level, The DataFrame will be stored in JVM memory as deserialized objects. When required storage is greater than available memory, it stores some of the excess partitions into a disk and reads the data from disk when it required. It is slower as there is I/O involved.
Flag for controlling the storage of an RDD. The default behavior of the DataFrame or Dataset. In this Storage Level, The DataFrame will be stored in JVM memory as deserialized objects. When required storage is greater than available memory, it stores some of the excess partitions into a disk and reads the data from disk when it required. It is slower as there is I/O involved.
Flag for controlling the storage of an RDD.
Same as memory-and-disk storage level but replicate each partition to two cluster nodes.
Flag for controlling the storage of an RDD. Same as memory-and-disk storage level but replicate each partition to two cluster nodes.
Flag for controlling the storage of an RDD.
Same as memory-and-disk
storage level difference being it serializes the DataFrame objects in memory and on disk when space not available.
Flag for controlling the storage of an RDD. Same as `memory-and-disk` storage level difference being it serializes the DataFrame objects in memory and on disk when space not available.
Flag for controlling the storage of an RDD.
Same as memory-and-disk-ser storage level but replicate each partition to two cluster nodes.
Flag for controlling the storage of an RDD. Same as memory-and-disk-ser storage level but replicate each partition to two cluster nodes.
Flag for controlling the storage of an RDD.
Flag for controlling the storage of an RDD.
Flag for controlling the storage of an RDD.
Same as memory-only
storage level but replicate each partition to two cluster nodes.
Flag for controlling the storage of an RDD. Same as `memory-only` storage level but replicate each partition to two cluster nodes.
Flag for controlling the storage of an RDD.
Same as memory-only
but the difference being it stores RDD as serialized objects to JVM memory. It takes lesser memory (space-efficient) then memory-only
as it saves objects as serialized and takes an additional few more CPU cycles in order to deserialize.
Flag for controlling the storage of an RDD. Same as `memory-only` but the difference being it stores RDD as serialized objects to JVM memory. It takes lesser memory (space-efficient) then `memory-only` as it saves objects as serialized and takes an additional few more CPU cycles in order to deserialize.
Flag for controlling the storage of an RDD.
Same as memory-only-ser
storage level but replicate each partition to two cluster nodes.
Flag for controlling the storage of an RDD. Same as `memory-only-ser` storage level but replicate each partition to two cluster nodes.
(merge expr & ms)
Variadic version of map-concat
.
Variadic version of `map-concat`.
(merge-in-place bloom-or-cms other)
Params: (other: BloomFilter)
Result: BloomFilter
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/BloomFilter.html
Timestamp: 2020-10-19T01:56:25.741Z
Params: (other: BloomFilter) Result: BloomFilter Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/BloomFilter.html Timestamp: 2020-10-19T01:56:25.741Z
(merge-with left right merge-fn)
Params: (left: Column, right: Column, f: (Column, Column, Column) ⇒ Column)
Result: Column
Merge two given maps, key-wise into a single map using a function.
the left input map column
the right input map column
(key, value1, value2) => new_value, the lambda function to merge the map values
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.474Z
Params: (left: Column, right: Column, f: (Column, Column, Column) ⇒ Column) Result: Column Merge two given maps, key-wise into a single map using a function. the left input map column the right input map column (key, value1, value2) => new_value, the lambda function to merge the map values 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.474Z
(might-contain bloom item)
Params: (item: Any)
Result: Boolean
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/BloomFilter.html
Timestamp: 2020-10-19T01:56:25.742Z
Params: (item: Any) Result: Boolean Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/BloomFilter.html Timestamp: 2020-10-19T01:56:25.742Z
Column: Aggregate function: returns the minimum value of the column in a group.
RelationalGroupedDataset: Compute the min value for each numeric columns for each group.
Column: Aggregate function: returns the minimum value of the column in a group. RelationalGroupedDataset: Compute the min value for each numeric columns for each group.
(minute expr)
Params: (e: Column)
Result: Column
Extracts the minutes as an integer from a given date/timestamp/string.
An integer, or null if the input was a string that could not be cast to a date
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.483Z
Params: (e: Column) Result: Column Extracts the minutes as an integer from a given date/timestamp/string. An integer, or null if the input was a string that could not be cast to a date 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.483Z
Params: (other: Any)
Result: Column
Modulo (a.k.a. remainder) expression.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.958Z
Params: (other: Any) Result: Column Modulo (a.k.a. remainder) expression. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.958Z
(monotonically-increasing-id)
Params: ()
Result: Column
A column expression that generates monotonically increasing 64-bit integers.
The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the record number within each partition in the lower 33 bits. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records.
As an example, consider a DataFrame with two partitions, each with 3 records. This expression would return the following IDs:
(Since version 2.0.0) Use monotonically_increasing_id()
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.744Z
Params: () Result: Column A column expression that generates monotonically increasing 64-bit integers. The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the record number within each partition in the lower 33 bits. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records. As an example, consider a DataFrame with two partitions, each with 3 records. This expression would return the following IDs: (Since version 2.0.0) Use monotonically_increasing_id() 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.744Z
(month expr)
Params: (e: Column)
Result: Column
Extracts the month as an integer from a given date/timestamp/string.
An integer, or null if the input was a string that could not be cast to a date
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.486Z
Params: (e: Column) Result: Column Extracts the month as an integer from a given date/timestamp/string. An integer, or null if the input was a string that could not be cast to a date 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.486Z
(months-between l-expr r-expr)
Params: (end: Column, start: Column)
Result: Column
Returns number of months between dates start and end.
A whole number is returned if both inputs have the same day of month or both are the last day of their respective months. Otherwise, the difference is calculated assuming 31 days per month.
For example:
A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
A date, timestamp or string. If a string, the data must be in a format that can cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
A double, or null if either end or start were strings that could not be cast to a timestamp. Negative if end is before start
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.490Z
Params: (end: Column, start: Column) Result: Column Returns number of months between dates start and end. A whole number is returned if both inputs have the same day of month or both are the last day of their respective months. Otherwise, the difference is calculated assuming 31 days per month. For example: A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS A date, timestamp or string. If a string, the data must be in a format that can cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS A double, or null if either end or start were strings that could not be cast to a timestamp. Negative if end is before start 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.490Z
(name-value-seq->dataset map-of-values)
(name-value-seq->dataset spark map-of-values)
Construct a Dataset from an associative map.
(g/show (g/map->dataset {:a [1 2], :b [3 4]}))
; +---+---+
; |a |b |
; +---+---+
; |1 |3 |
; |2 |4 |
; +---+---+
Construct a Dataset from an associative map. ```clojure (g/show (g/map->dataset {:a [1 2], :b [3 4]})) ; +---+---+ ; |a |b | ; +---+---+ ; |1 |3 | ; |2 |4 | ; +---+---+ ```
(nan? expr)
Params:
Result: Column
True if the current expression is NaN.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.927Z
Params: Result: Column True if the current expression is NaN. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.927Z
(nanvl left-expr right-expr)
Params: (col1: Column, col2: Column)
Result: Column
Returns col1 if it is not NaN, or col2 if col1 is NaN.
Both inputs should be floating point columns (DoubleType or FloatType).
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.492Z
Params: (col1: Column, col2: Column) Result: Column Returns col1 if it is not NaN, or col2 if col1 is NaN. Both inputs should be floating point columns (DoubleType or FloatType). 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.492Z
(neg? expr)
Returns true if expr
is less than zero, else false.
Returns true if `expr` is less than zero, else false.
(negate expr)
Params: (e: Column)
Result: Column
Unary minus, i.e. negate the expression.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.494Z
Params: (e: Column) Result: Column Unary minus, i.e. negate the expression. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.494Z
(next-day expr day-of-week)
Params: (date: Column, dayOfWeek: String)
Result: Column
Returns the first date which is later than the value of the date column that is on the specified day of the week.
For example, next_day('2015-07-27', "Sunday") returns 2015-08-02 because that is the first Sunday after 2015-07-27.
A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
Case insensitive, and accepts: "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"
A date, or null if date was a string that could not be cast to a date or if dayOfWeek was an invalid value
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.495Z
Params: (date: Column, dayOfWeek: String) Result: Column Returns the first date which is later than the value of the date column that is on the specified day of the week. For example, next_day('2015-07-27', "Sunday") returns 2015-08-02 because that is the first Sunday after 2015-07-27. A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS Case insensitive, and accepts: "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun" A date, or null if date was a string that could not be cast to a date or if dayOfWeek was an invalid value 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.495Z
(nlargest dataframe n-rows expr)
Return the Dataset with the first n-rows
rows ordered by expr
in descending order.
Return the Dataset with the first `n-rows` rows ordered by `expr` in descending order.
Flag for controlling the storage of an RDD.
No caching.
Flag for controlling the storage of an RDD. No caching.
(not expr)
Params: (e: Column)
Result: Column
Inversion of boolean expression, i.e. NOT.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.497Z
Params: (e: Column) Result: Column Inversion of boolean expression, i.e. NOT. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.497Z
(not-null? expr)
Params:
Result: Column
True if the current expression is NOT null.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.932Z
Params: Result: Column True if the current expression is NOT null. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.932Z
(nsmallest dataframe n-rows expr)
Return the Dataset with the first n-rows
rows ordered by expr
in ascending order.
Return the Dataset with the first `n-rows` rows ordered by `expr` in ascending order.
(ntile n)
Params: (n: Int)
Result: Column
Window function: returns the ntile group id (from 1 to n inclusive) in an ordered window partition. For example, if n is 4, the first quarter of the rows will get value 1, the second quarter will get 2, the third quarter will get 3, and the last quarter will get 4.
This is equivalent to the NTILE function in SQL.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.500Z
Params: (n: Int) Result: Column Window function: returns the ntile group id (from 1 to n inclusive) in an ordered window partition. For example, if n is 4, the first quarter of the rows will get value 1, the second quarter will get 2, the third quarter will get 3, and the last quarter will get 4. This is equivalent to the NTILE function in SQL. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.500Z
(null-count expr)
Aggregate function: returns the null count of a column.
Aggregate function: returns the null count of a column.
(null-rate expr)
Aggregate function: returns the null rate of a column.
Aggregate function: returns the null rate of a column.
(null? expr)
Params:
Result: Column
True if the current expression is null.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.933Z
Params: Result: Column True if the current expression is null. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.933Z
(nunique dataframe)
Count distinct observations over all columns in the Dataset.
Count distinct observations over all columns in the Dataset.
(odd? expr)
Returns true if expr
is odd, else false.
Returns true if `expr` is odd, else false.
Flag for controlling the storage of an RDD.
Off-heap refers to objects (serialised to byte array) that are managed by the operating system but stored outside the process heap in native memory (therefore, they are not processed by the garbage collector). Accessing this data is slightly slower than accessing the on-heap storage but still faster than reading/writing from a disk. The downside is that the user has to manually deal with managing the allocated memory.
Flag for controlling the storage of an RDD. Off-heap refers to objects (serialised to byte array) that are managed by the operating system but stored outside the process heap in native memory (therefore, they are not processed by the garbage collector). Accessing this data is slightly slower than accessing the on-heap storage but still faster than reading/writing from a disk. The downside is that the user has to manually deal with managing the allocated memory.
(order-by dataframe & exprs)
Params: (sortCol: String, sortCols: String*)
Result: Dataset[T]
Returns a new Dataset sorted by the given expressions. This is an alias of the sort function.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.884Z
Params: (sortCol: String, sortCols: String*) Result: Dataset[T] Returns a new Dataset sorted by the given expressions. This is an alias of the sort function. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.884Z
(over column window-spec)
Params: (window: WindowSpec)
Result: Column
Defines a windowing column.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.973Z
Params: (window: WindowSpec) Result: Column Defines a windowing column. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.973Z
(overlay src rep pos)
(overlay src rep pos len)
Params: (src: Column, replace: Column, pos: Column, len: Column)
Result: Column
Overlay the specified portion of src with replace, starting from byte position pos of src and proceeding for len bytes.
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.503Z
Params: (src: Column, replace: Column, pos: Column, len: Column) Result: Column Overlay the specified portion of src with replace, starting from byte position pos of src and proceeding for len bytes. 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.503Z
(partitions dataframe)
Params:
Result: List[Partition]
Set of partitions in this RDD.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html
Timestamp: 2020-10-19T01:56:48.891Z
Params: Result: List[Partition] Set of partitions in this RDD. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html Timestamp: 2020-10-19T01:56:48.891Z
(percent-rank)
Params: ()
Result: Column
Window function: returns the relative rank (i.e. percentile) of rows within a window partition.
This is computed by:
This is equivalent to the PERCENT_RANK function in SQL.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.504Z
Params: () Result: Column Window function: returns the relative rank (i.e. percentile) of rows within a window partition. This is computed by: This is equivalent to the PERCENT_RANK function in SQL. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.504Z
(persist dataframe)
(persist dataframe new-level)
Params: ()
Result: Dataset.this.type
Persist this Dataset with the default storage level (MEMORY_AND_DISK).
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.886Z
Params: () Result: Dataset.this.type Persist this Dataset with the default storage level (MEMORY_AND_DISK). 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.886Z
The double value that is closer than any other to pi, the ratio of the circumference of a circle to its diameter.
The double value that is closer than any other to pi, the ratio of the circumference of a circle to its diameter.
(pivot grouped expr)
(pivot grouped expr values)
Params: (pivotColumn: String)
Result: RelationalGroupedDataset
Pivots a column of the current DataFrame and performs the specified aggregation.
There are two versions of pivot function: one that requires the caller to specify the list of distinct values to pivot on, and one that does not. The latter is more concise but less efficient, because Spark needs to first compute the list of distinct values internally.
Name of the column to pivot.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/RelationalGroupedDataset.html
Timestamp: 2020-10-19T01:56:23.317Z
Params: (pivotColumn: String) Result: RelationalGroupedDataset Pivots a column of the current DataFrame and performs the specified aggregation. There are two versions of pivot function: one that requires the caller to specify the list of distinct values to pivot on, and one that does not. The latter is more concise but less efficient, because Spark needs to first compute the list of distinct values internally. Name of the column to pivot. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/RelationalGroupedDataset.html Timestamp: 2020-10-19T01:56:23.317Z
(pmod left-expr right-expr)
Params: (dividend: Column, divisor: Column)
Result: Column
Returns the positive value of dividend mod divisor.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.505Z
Params: (dividend: Column, divisor: Column) Result: Column Returns the positive value of dividend mod divisor. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.505Z
(pos? expr)
Returns true if expr
is greater than zero, else false.
Returns true if `expr` is greater than zero, else false.
(posexplode expr)
Params: (e: Column)
Result: Column
Creates a new row for each element with position in the given array or map column. Uses the default column name pos for position, and col for elements in the array and key and value for elements in the map unless specified otherwise.
2.1.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.506Z
Params: (e: Column) Result: Column Creates a new row for each element with position in the given array or map column. Uses the default column name pos for position, and col for elements in the array and key and value for elements in the map unless specified otherwise. 2.1.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.506Z
(posexplode-outer expr)
Params: (e: Column)
Result: Column
Creates a new row for each element with position in the given array or map column. Uses the default column name pos for position, and col for elements in the array and key and value for elements in the map unless specified otherwise.
2.1.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.506Z
Params: (e: Column) Result: Column Creates a new row for each element with position in the given array or map column. Uses the default column name pos for position, and col for elements in the array and key and value for elements in the map unless specified otherwise. 2.1.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.506Z
(pow base exponent)
Params: (l: Column, r: Column)
Result: Column
Returns the value of the first argument raised to the power of the second argument.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.520Z
Params: (l: Column, r: Column) Result: Column Returns the value of the first argument raised to the power of the second argument. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.520Z
(print-schema dataframe)
Params: ()
Result: Unit
Prints the schema to the console in a nice tree format.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.888Z
Params: () Result: Unit Prints the schema to the console in a nice tree format. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.888Z
(put bloom item)
Params: (item: Any)
Result: Boolean
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/BloomFilter.html
Timestamp: 2020-10-19T01:56:25.746Z
Params: (item: Any) Result: Boolean Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/BloomFilter.html Timestamp: 2020-10-19T01:56:25.746Z
(qcut expr num-buckets-or-probs)
Returns a new Column of discretised expr
into equal-sized buckets based
on rank or based on sample quantiles.
Returns a new Column of discretised `expr` into equal-sized buckets based on rank or based on sample quantiles.
Column: Aggregate function: returns the quantile of the values in a group.
RelationalGroupedDataset: Compute the quantile for each numeric columns for each group.
Column: Aggregate function: returns the quantile of the values in a group. RelationalGroupedDataset: Compute the quantile for each numeric columns for each group.
(quarter expr)
Params: (e: Column)
Result: Column
Extracts the quarter as an integer from a given date/timestamp/string.
An integer, or null if the input was a string that could not be cast to a date
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.521Z
Params: (e: Column) Result: Column Extracts the quarter as an integer from a given date/timestamp/string. An integer, or null if the input was a string that could not be cast to a date 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.521Z
(radians expr)
Params: (e: Column)
Result: Column
Converts an angle measured in degrees to an approximately equivalent angle measured in radians.
angle in degrees
angle in radians, as if computed by java.lang.Math.toRadians
2.1.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.523Z
Params: (e: Column) Result: Column Converts an angle measured in degrees to an approximately equivalent angle measured in radians. angle in degrees angle in radians, as if computed by java.lang.Math.toRadians 2.1.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.523Z
(rand)
(rand seed)
Params: (seed: Long)
Result: Column
Generate a random column with independent and identically distributed (i.i.d.) samples uniformly distributed in [0.0, 1.0).
1.4.0
The function is non-deterministic in general case.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.526Z
Params: (seed: Long) Result: Column Generate a random column with independent and identically distributed (i.i.d.) samples uniformly distributed in [0.0, 1.0). 1.4.0 The function is non-deterministic in general case. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.526Z
(rand-nth dataframe)
Returns a random row collected.
Returns a random row collected.
(randn)
(randn seed)
Params: (seed: Long)
Result: Column
Generate a column with independent and identically distributed (i.i.d.) samples from the standard normal distribution.
1.4.0
The function is non-deterministic in general case.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.528Z
Params: (seed: Long) Result: Column Generate a column with independent and identically distributed (i.i.d.) samples from the standard normal distribution. 1.4.0 The function is non-deterministic in general case. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.528Z
(random-choice choices)
(random-choice choices probs)
(random-choice choices probs seed)
Returns a new Column of a random sample from a given collection of choices
.
Returns a new Column of a random sample from a given collection of `choices`.
(random-exp)
(random-exp rate)
(random-exp rate seed)
Returns a new Column of draws from an exponential distribution.
Returns a new Column of draws from an exponential distribution.
(random-int)
(random-int low high)
(random-int low high seed)
Returns a new Column of random integers from low
(inclusive) to high
(exclusive).
Returns a new Column of random integers from `low` (inclusive) to `high` (exclusive).
(random-norm)
(random-norm mu sigma)
(random-norm mu sigma seed)
Returns a new Column of draws from a normal distribution.
Returns a new Column of draws from a normal distribution.
(random-split dataframe weights)
(random-split dataframe weights seed)
Params: (weights: Array[Double], seed: Long)
Result: Array[Dataset[T]]
Randomly splits this Dataset with the provided weights.
weights for splits, will be normalized if they don't sum to 1.
Seed for sampling. For Java API, use randomSplitAsList.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.892Z
Params: (weights: Array[Double], seed: Long) Result: Array[Dataset[T]] Randomly splits this Dataset with the provided weights. weights for splits, will be normalized if they don't sum to 1. Seed for sampling. For Java API, use randomSplitAsList. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.892Z
(random-uniform)
(random-uniform low high)
(random-uniform low high seed)
Returns a new Column of draws from a uniform distribution.
Returns a new Column of draws from a uniform distribution.
Creates a Dataset
with a single LongType
column named id
.
The Dataset
contains elements in a range from start
(default 0) to end
(exclusive)
with the given step
(default 1).
If num-partitions
is specified, the dataset will be distributed into the specified number
of partitions. Otherwise, spark uses internal logic to determine the number of partitions.
Creates a `Dataset` with a single `LongType` column named `id`. The `Dataset` contains elements in a range from `start` (default 0) to `end` (exclusive) with the given `step` (default 1). If `num-partitions` is specified, the dataset will be distributed into the specified number of partitions. Otherwise, spark uses internal logic to determine the number of partitions.
(rank)
Params: ()
Result: Column
Window function: returns the rank of rows within a window partition.
The difference between rank and dense_rank is that dense_rank leaves no gaps in ranking sequence when there are ties. That is, if you were ranking a competition using dense_rank and had three people tie for second place, you would say that all three were in second place and that the next person came in third. Rank would give me sequential numbers, making the person that came in third place (after the ties) would register as coming in fifth.
This is equivalent to the RANK function in SQL.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.529Z
Params: () Result: Column Window function: returns the rank of rows within a window partition. The difference between rank and dense_rank is that dense_rank leaves no gaps in ranking sequence when there are ties. That is, if you were ranking a competition using dense_rank and had three people tie for second place, you would say that all three were in second place and that the next person came in third. Rank would give me sequential numbers, making the person that came in third place (after the ties) would register as coming in fifth. This is equivalent to the RANK function in SQL. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.529Z
(rchoice choices)
(rchoice choices probs)
(rchoice choices probs seed)
Returns a new Column of a random sample from a given collection of choices
.
Returns a new Column of a random sample from a given collection of `choices`.
(rdd dataframe)
Params:
Result: RDD[T]
Represents the content of the Dataset as an RDD of T.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.894Z
Params: Result: RDD[T] Represents the content of the Dataset as an RDD of T. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.894Z
Loads an Avro file and returns the results as a DataFrame.
Spark's DataFrameReader options may be passed in as a map of options.
See: https://spark.apache.org/docs/latest/sql-data-sources.html
Loads an Avro file and returns the results as a DataFrame. Spark's DataFrameReader options may be passed in as a map of options. See: https://spark.apache.org/docs/latest/sql-data-sources.html
Loads a binary file and returns the results as a DataFrame.
Spark's DataFrameReader options may be passed in as a map of options.
See: https://spark.apache.org/docs/latest/sql-data-sources-binaryFile.html
Loads a binary file and returns the results as a DataFrame. Spark's DataFrameReader options may be passed in as a map of options. See: https://spark.apache.org/docs/latest/sql-data-sources-binaryFile.html
Loads a CSV file and returns the results as a DataFrame.
Spark's DataFrameReader options may be passed in as a map of options.
See: https://spark.apache.org/docs/latest/sql-data-sources.html
Loads a CSV file and returns the results as a DataFrame. Spark's DataFrameReader options may be passed in as a map of options. See: https://spark.apache.org/docs/latest/sql-data-sources.html
Loads an EDN file and returns the results as a DataFrame.
Loads an EDN file and returns the results as a DataFrame.
(read-jdbc! options)
(read-jdbc! spark options)
Loads a database table and returns the results as a DataFrame.
Spark's DataFrameReader options may be passed in as a map of options.
See: https://spark.apache.org/docs/latest/sql-data-sources.html
Loads a database table and returns the results as a DataFrame. Spark's DataFrameReader options may be passed in as a map of options. See: https://spark.apache.org/docs/latest/sql-data-sources.html
Loads a JSON file and returns the results as a DataFrame.
Spark's DataFrameReader options may be passed in as a map of options.
See: https://spark.apache.org/docs/latest/sql-data-sources.html
Loads a JSON file and returns the results as a DataFrame. Spark's DataFrameReader options may be passed in as a map of options. See: https://spark.apache.org/docs/latest/sql-data-sources.html
Loads a LIBSVM file and returns the results as a DataFrame.
Spark's DataFrameReader options may be passed in as a map of options.
See: https://spark.apache.org/docs/latest/sql-data-sources.html
Loads a LIBSVM file and returns the results as a DataFrame. Spark's DataFrameReader options may be passed in as a map of options. See: https://spark.apache.org/docs/latest/sql-data-sources.html
Loads a Parquet file and returns the results as a DataFrame.
Spark's DataFrameReader options may be passed in as a map of options.
See: https://spark.apache.org/docs/latest/sql-data-sources-parquet.html
Loads a Parquet file and returns the results as a DataFrame. Spark's DataFrameReader options may be passed in as a map of options. See: https://spark.apache.org/docs/latest/sql-data-sources-parquet.html
Reads a managed (hive) table and returns the result as a DataFrame.
Reads a managed (hive) table and returns the result as a DataFrame.
Loads a text file and returns the results as a DataFrame.
Spark's DataFrameReader options may be passed in as a map of options.
See: https://spark.apache.org/docs/latest/sql-data-sources.html
Loads a text file and returns the results as a DataFrame. Spark's DataFrameReader options may be passed in as a map of options. See: https://spark.apache.org/docs/latest/sql-data-sources.html
Loads an Excel file and returns the results as a DataFrame.
Example options:
{:header true :sheet "Sheet2"}
Loads an Excel file and returns the results as a DataFrame. Example options: ```clojure {:header true :sheet "Sheet2"} ```
(records->dataset records)
(records->dataset spark records)
Construct a Dataset from a collection of maps.
(g/show (g/records->dataset [{:a 1 :b 2} {:a 3 :b 4}]))
; +---+---+
; |a |b |
; +---+---+
; |1 |2 |
; |3 |4 |
; +---+---+
Construct a Dataset from a collection of maps. ```clojure (g/show (g/records->dataset [{:a 1 :b 2} {:a 3 :b 4}])) ; +---+---+ ; |a |b | ; +---+---+ ; |1 |2 | ; |3 |4 | ; +---+---+ ```
(regexp-extract expr regex idx)
Params: (e: Column, exp: String, groupIdx: Int)
Result: Column
Extract a specific group matched by a Java regex, from the specified string column. If the regex did not match, or the specified group did not match, an empty string is returned.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.530Z
Params: (e: Column, exp: String, groupIdx: Int) Result: Column Extract a specific group matched by a Java regex, from the specified string column. If the regex did not match, or the specified group did not match, an empty string is returned. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.530Z
(regexp-replace expr pattern-expr replacement-expr)
Params: (e: Column, pattern: String, replacement: String)
Result: Column
Replace all substrings of the specified string value that match regexp with rep.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.532Z
Params: (e: Column, pattern: String, replacement: String) Result: Column Replace all substrings of the specified string value that match regexp with rep. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.532Z
(relative-error cms)
Params: ()
Result: Double
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/CountMinSketch.html
Timestamp: 2020-10-19T01:56:26.106Z
Params: () Result: Double Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/CountMinSketch.html Timestamp: 2020-10-19T01:56:26.106Z
(remove dataframe expr)
Returns a new Dataset that only contains elements where func returns false.
Returns a new Dataset that only contains elements where func returns false.
(rename-columns dataframe rename-map)
Returns a new Dataset with a column renamed according to the rename-map.
Returns a new Dataset with a column renamed according to the rename-map.
(rename-keys expr kmap)
Same as transform-keys
with a map arg.
Same as `transform-keys` with a map arg.
(repartition dataframe & args)
Params: (numPartitions: Int)
Result: Dataset[T]
Returns a new Dataset that has exactly numPartitions partitions.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.901Z
Params: (numPartitions: Int) Result: Dataset[T] Returns a new Dataset that has exactly numPartitions partitions. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.901Z
(repartition-by-range dataframe & args)
Params: (numPartitions: Int, partitionExprs: Column*)
Result: Dataset[T]
Returns a new Dataset partitioned by the given partitioning expressions into numPartitions. The resulting Dataset is range partitioned.
At least one partition-by expression must be specified. When no explicit sort order is specified, "ascending nulls first" is assumed. Note, the rows are not sorted in each partition of the resulting Dataset.
Note that due to performance reasons this method uses sampling to estimate the ranges. Hence, the output may not be consistent, since sampling can return different values. The sample size can be controlled by the config spark.sql.execution.rangeExchange.sampleSizePerPartition.
2.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.904Z
Params: (numPartitions: Int, partitionExprs: Column*) Result: Dataset[T] Returns a new Dataset partitioned by the given partitioning expressions into numPartitions. The resulting Dataset is range partitioned. At least one partition-by expression must be specified. When no explicit sort order is specified, "ascending nulls first" is assumed. Note, the rows are not sorted in each partition of the resulting Dataset. Note that due to performance reasons this method uses sampling to estimate the ranges. Hence, the output may not be consistent, since sampling can return different values. The sample size can be controlled by the config spark.sql.execution.rangeExchange.sampleSizePerPartition. 2.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.904Z
(replace expr lookup-map)
(replace expr from-value-or-values to-value)
Returns a new Column where from-value-or-values
is replaced with to-value
.
Returns a new Column where `from-value-or-values` is replaced with `to-value`.
(replace-na dataframe cols replacement)
Params: (col: String, replacement: Map[T, T])
Result: DataFrame
Replaces values matching keys in replacement map with the corresponding values.
name of the column to apply the value replacement. If col is "*", replacement is applied on all string, numeric or boolean columns.
value replacement map. Key and value of replacement map must have the same type, and can only be doubles, strings or booleans. The map value can have nulls.
1.3.1
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameNaFunctions.html
Timestamp: 2020-10-19T01:56:23.927Z
Params: (col: String, replacement: Map[T, T]) Result: DataFrame Replaces values matching keys in replacement map with the corresponding values. name of the column to apply the value replacement. If col is "*", replacement is applied on all string, numeric or boolean columns. value replacement map. Key and value of replacement map must have the same type, and can only be doubles, strings or booleans. The map value can have nulls. 1.3.1 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameNaFunctions.html Timestamp: 2020-10-19T01:56:23.927Z
(resources)
(resources spark)
Params:
Result: Map[String, ResourceInformation]
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html
Timestamp: 2020-10-19T01:56:49.550Z
Params: Result: Map[String, ResourceInformation] Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html Timestamp: 2020-10-19T01:56:49.550Z
(reverse expr)
Params: (e: Column)
Result: Column
Returns a reversed string or an array with reverse order of elements.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.534Z
Params: (e: Column) Result: Column Returns a reversed string or an array with reverse order of elements. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.534Z
(rexp)
(rexp rate)
(rexp rate seed)
Returns a new Column of draws from an exponential distribution.
Returns a new Column of draws from an exponential distribution.
(rint expr)
Params: (e: Column)
Result: Column
Returns the double value that is closest in value to the argument and is equal to a mathematical integer.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.536Z
Params: (e: Column) Result: Column Returns the double value that is closest in value to the argument and is equal to a mathematical integer. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.536Z
(rlike expr literal)
Params: (literal: String)
Result: Column
SQL RLIKE expression (LIKE with Regex). Returns a boolean column based on a regex match.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.977Z
Params: (literal: String) Result: Column SQL RLIKE expression (LIKE with Regex). Returns a boolean column based on a regex match. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.977Z
(rnorm)
(rnorm mu sigma)
(rnorm mu sigma seed)
Returns a new Column of draws from a normal distribution.
Returns a new Column of draws from a normal distribution.
(rollup dataframe & exprs)
Params: (cols: Column*)
Result: RelationalGroupedDataset
Create a multi-dimensional rollup for the current Dataset using the specified columns, so we can run aggregation on them. See RelationalGroupedDataset for all the available aggregate functions.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.907Z
Params: (cols: Column*) Result: RelationalGroupedDataset Create a multi-dimensional rollup for the current Dataset using the specified columns, so we can run aggregation on them. See RelationalGroupedDataset for all the available aggregate functions. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.907Z
(round expr)
Params: (e: Column)
Result: Column
Returns the value of the column e rounded to 0 decimal places with HALF_UP round mode.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.539Z
Params: (e: Column) Result: Column Returns the value of the column e rounded to 0 decimal places with HALF_UP round mode. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.539Z
(row & values)
Params: (values: Seq[Any])
Result: Row
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Row$.html
Timestamp: 2020-10-19T01:56:24.277Z
Params: (values: Seq[Any]) Result: Row Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Row$.html Timestamp: 2020-10-19T01:56:24.277Z
(row-number)
Params: ()
Result: Column
Window function: returns a sequential number starting at 1 within a window partition.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.540Z
Params: () Result: Column Window function: returns a sequential number starting at 1 within a window partition. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.540Z
(rpad expr length pad)
Params: (str: Column, len: Int, pad: String)
Result: Column
Right-pad the string column with pad to a length of len. If the string column is longer than len, the return value is shortened to len characters.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.541Z
Params: (str: Column, len: Int, pad: String) Result: Column Right-pad the string column with pad to a length of len. If the string column is longer than len, the return value is shortened to len characters. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.541Z
(rtrim expr)
Params: (e: Column)
Result: Column
Trim the spaces from right end for the specified string value.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.543Z
Params: (e: Column) Result: Column Trim the spaces from right end for the specified string value. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.543Z
(runif)
(runif low high)
(runif low high seed)
Returns a new Column of draws from a uniform distribution.
Returns a new Column of draws from a uniform distribution.
(runiform)
(runiform low high)
(runiform low high seed)
Returns a new Column of draws from a uniform distribution.
Returns a new Column of draws from a uniform distribution.
(sample dataframe fraction)
(sample dataframe fraction with-replacement)
Params: (fraction: Double, seed: Long)
Result: Dataset[T]
Returns a new Dataset by sampling a fraction of rows (without replacement), using a user-supplied seed.
Fraction of rows to generate, range [0.0, 1.0].
Seed for sampling.
2.3.0
This is NOT guaranteed to provide exactly the fraction of the count of the given Dataset.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.913Z
Params: (fraction: Double, seed: Long) Result: Dataset[T] Returns a new Dataset by sampling a fraction of rows (without replacement), using a user-supplied seed. Fraction of rows to generate, range [0.0, 1.0]. Seed for sampling. 2.3.0 This is NOT guaranteed to provide exactly the fraction of the count of the given Dataset. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.913Z
(sample-by dataframe expr fractions seed)
Params: (col: String, fractions: Map[T, Double], seed: Long)
Result: DataFrame
Returns a stratified sample without replacement based on the fraction given on each stratum.
stratum type
column that defines strata
sampling fraction for each stratum. If a stratum is not specified, we treat its fraction as zero.
random seed
a new DataFrame that represents the stratified sample
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameStatFunctions.html
Timestamp: 2020-10-19T01:56:24.694Z
Params: (col: String, fractions: Map[T, Double], seed: Long) Result: DataFrame Returns a stratified sample without replacement based on the fraction given on each stratum. stratum type column that defines strata sampling fraction for each stratum. If a stratum is not specified, we treat its fraction as zero. random seed a new DataFrame that represents the stratified sample 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/DataFrameStatFunctions.html Timestamp: 2020-10-19T01:56:24.694Z
(sc)
(sc spark)
Params:
Result: SparkContext
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html
Timestamp: 2020-10-19T01:56:49.550Z
Params: Result: SparkContext Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html Timestamp: 2020-10-19T01:56:49.550Z
(schema-of-csv expr)
(schema-of-csv expr options)
Params: (csv: String)
Result: Column
Parses a CSV string and infers its schema in DDL format.
a CSV string.
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.547Z
Params: (csv: String) Result: Column Parses a CSV string and infers its schema in DDL format. a CSV string. 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.547Z
(schema-of-json expr)
(schema-of-json expr options)
Params: (json: String)
Result: Column
Parses a JSON string and infers its schema in DDL format.
a JSON string.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.554Z
Params: (json: String) Result: Column Parses a JSON string and infers its schema in DDL format. a JSON string. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.554Z
(second expr)
Params: (e: Column)
Result: Column
Extracts the seconds as an integer from a given date/timestamp/string.
An integer, or null if the input was a string that could not be cast to a timestamp
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.555Z
Params: (e: Column) Result: Column Extracts the seconds as an integer from a given date/timestamp/string. An integer, or null if the input was a string that could not be cast to a timestamp 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.555Z
(select dataframe & exprs)
Params: (cols: Column*)
Result: DataFrame
Selects a set of column based expressions.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.931Z
Params: (cols: Column*) Result: DataFrame Selects a set of column based expressions. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.931Z
(select-columns dataframe & exprs)
Params: (cols: Column*)
Result: DataFrame
Selects a set of column based expressions.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.931Z
Params: (cols: Column*) Result: DataFrame Selects a set of column based expressions. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.931Z
(select-expr dataframe & exprs)
Params: (exprs: String*)
Result: DataFrame
Selects a set of SQL expressions. This is a variant of select that accepts SQL expressions.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.933Z
Params: (exprs: String*) Result: DataFrame Selects a set of SQL expressions. This is a variant of select that accepts SQL expressions. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.933Z
(select-keys expr ks)
Returns a map containing only those entries in map (expr
) whose key is in ks
.
Returns a map containing only those entries in map (`expr`) whose key is in `ks`.
(sequence start stop step)
Params: (start: Column, stop: Column, step: Column)
Result: Column
Generate a sequence of integers from start to stop, incrementing by step.
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.557Z
Params: (start: Column, stop: Column, step: Column) Result: Column Generate a sequence of integers from start to stop, incrementing by step. 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.557Z
(sha-1 expr)
Params: (e: Column)
Result: Column
Calculates the SHA-1 digest of a binary column and returns the value as a 40 character hex string.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.558Z
Params: (e: Column) Result: Column Calculates the SHA-1 digest of a binary column and returns the value as a 40 character hex string. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.558Z
(sha-2 expr n-bits)
Params: (e: Column, numBits: Int)
Result: Column
Calculates the SHA-2 family of hash functions of a binary column and returns the value as a hex string.
column to compute SHA-2 on.
one of 224, 256, 384, or 512.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.559Z
Params: (e: Column, numBits: Int) Result: Column Calculates the SHA-2 family of hash functions of a binary column and returns the value as a hex string. column to compute SHA-2 on. one of 224, 256, 384, or 512. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.559Z
(sha1 expr)
Params: (e: Column)
Result: Column
Calculates the SHA-1 digest of a binary column and returns the value as a 40 character hex string.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.558Z
Params: (e: Column) Result: Column Calculates the SHA-1 digest of a binary column and returns the value as a 40 character hex string. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.558Z
(sha2 expr n-bits)
Params: (e: Column, numBits: Int)
Result: Column
Calculates the SHA-2 family of hash functions of a binary column and returns the value as a hex string.
column to compute SHA-2 on.
one of 224, 256, 384, or 512.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.559Z
Params: (e: Column, numBits: Int) Result: Column Calculates the SHA-2 family of hash functions of a binary column and returns the value as a hex string. column to compute SHA-2 on. one of 224, 256, 384, or 512. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.559Z
(shape dataframe)
Returns a vector representing the dimensionality of the Dataset.
Returns a vector representing the dimensionality of the Dataset.
(shift-left expr num-bits)
Params: (e: Column, numBits: Int)
Result: Column
Shift the given value numBits left. If the given value is a long value, this function will return a long value else it will return an integer value.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.560Z
Params: (e: Column, numBits: Int) Result: Column Shift the given value numBits left. If the given value is a long value, this function will return a long value else it will return an integer value. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.560Z
(shift-right expr num-bits)
Params: (e: Column, numBits: Int)
Result: Column
(Signed) shift the given value numBits right. If the given value is a long value, it will return a long value else it will return an integer value.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.562Z
Params: (e: Column, numBits: Int) Result: Column (Signed) shift the given value numBits right. If the given value is a long value, it will return a long value else it will return an integer value. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.562Z
(shift-right-unsigned expr num-bits)
Params: (e: Column, numBits: Int)
Result: Column
Unsigned shift the given value numBits right. If the given value is a long value, it will return a long value else it will return an integer value.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.563Z
Params: (e: Column, numBits: Int) Result: Column Unsigned shift the given value numBits right. If the given value is a long value, it will return a long value else it will return an integer value. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.563Z
(show dataframe)
(show dataframe options)
Params: (numRows: Int)
Result: Unit
Displays the Dataset in a tabular form. Strings more than 20 characters will be truncated, and all cells will be aligned right. For example:
Number of rows to show
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.945Z
Params: (numRows: Int) Result: Unit Displays the Dataset in a tabular form. Strings more than 20 characters will be truncated, and all cells will be aligned right. For example: Number of rows to show 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.945Z
(show-vertical dataframe)
(show-vertical dataframe options)
Displays the Dataset in a list-of-records form.
Displays the Dataset in a list-of-records form.
Column: Returns a random permutation of the given array.
Dataset: Shuffles the rows of the Dataset.
Column: Returns a random permutation of the given array. Dataset: Shuffles the rows of the Dataset.
(signum expr)
Params: (e: Column)
Result: Column
Computes the signum of the given value.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.566Z
Params: (e: Column) Result: Column Computes the signum of the given value. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.566Z
(sin expr)
Params: (e: Column)
Result: Column
angle in radians
sine of the angle, as if computed by java.lang.Math.sin
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.568Z
Params: (e: Column) Result: Column angle in radians sine of the angle, as if computed by java.lang.Math.sin 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.568Z
(sinh expr)
Params: (e: Column)
Result: Column
hyperbolic angle
hyperbolic sine of the given value, as if computed by java.lang.Math.sinh
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.570Z
Params: (e: Column) Result: Column hyperbolic angle hyperbolic sine of the given value, as if computed by java.lang.Math.sinh 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.570Z
(size expr)
Params: (e: Column)
Result: Column
Returns length of array or map.
The function returns null for null input if spark.sql.legacy.sizeOfNull is set to false or spark.sql.ansi.enabled is set to true. Otherwise, the function returns -1 for null input. With the default settings, the function returns -1 for null input.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.571Z
Params: (e: Column) Result: Column Returns length of array or map. The function returns null for null input if spark.sql.legacy.sizeOfNull is set to false or spark.sql.ansi.enabled is set to true. Otherwise, the function returns -1 for null input. With the default settings, the function returns -1 for null input. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.571Z
(skewness expr)
Params: (e: Column)
Result: Column
Aggregate function: returns the skewness of the values in a group.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.574Z
Params: (e: Column) Result: Column Aggregate function: returns the skewness of the values in a group. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.574Z
(slice expr start length)
Params: (x: Column, start: Int, length: Int)
Result: Column
Returns an array containing all the elements in x from index start (or starting from the end if start is negative) with the specified length.
the array column to be sliced
the starting index
the length of the slice
2.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.575Z
Params: (x: Column, start: Int, length: Int) Result: Column Returns an array containing all the elements in x from index start (or starting from the end if start is negative) with the specified length. the array column to be sliced the starting index the length of the slice 2.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.575Z
(sort dataframe & exprs)
Params: (sortCol: String, sortCols: String*)
Result: Dataset[T]
Returns a new Dataset sorted by the given expressions. This is an alias of the sort function.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.884Z
Params: (sortCol: String, sortCols: String*) Result: Dataset[T] Returns a new Dataset sorted by the given expressions. This is an alias of the sort function. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.884Z
(sort-array expr)
(sort-array expr asc)
Params: (e: Column)
Result: Column
Sorts the input array for the given column in ascending order, according to the natural ordering of the array elements. Null elements will be placed at the beginning of the returned array.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.577Z
Params: (e: Column) Result: Column Sorts the input array for the given column in ascending order, according to the natural ordering of the array elements. Null elements will be placed at the beginning of the returned array. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.577Z
(sort-within-partitions dataframe & exprs)
Params: (sortCol: String, sortCols: String*)
Result: Dataset[T]
Returns a new Dataset with each partition sorted by the given expressions.
This is the same operation as "SORT BY" in SQL (Hive QL).
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.950Z
Params: (sortCol: String, sortCols: String*) Result: Dataset[T] Returns a new Dataset with each partition sorted by the given expressions. This is the same operation as "SORT BY" in SQL (Hive QL). 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.950Z
(soundex expr)
Params: (e: Column)
Result: Column
Returns the soundex code for the specified expression.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.578Z
Params: (e: Column) Result: Column Returns the soundex code for the specified expression. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.578Z
(spark-conf spark-session)
Params:
Result: SparkConf
Return a copy of this JavaSparkContext's configuration. The configuration cannot be changed at runtime.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html
Timestamp: 2020-10-19T01:56:49.511Z
Params: Result: SparkConf Return a copy of this JavaSparkContext's configuration. The configuration cannot be changed at runtime. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html Timestamp: 2020-10-19T01:56:49.511Z
(spark-context)
(spark-context spark)
Params:
Result: SparkContext
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html
Timestamp: 2020-10-19T01:56:49.550Z
Params: Result: SparkContext Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html Timestamp: 2020-10-19T01:56:49.550Z
(spark-home)
(spark-home spark)
Params: ()
Result: Optional[String]
Get Spark's home location from either a value set through the constructor, or the spark.home Java property, or the SPARK_HOME environment variable (in that order of preference). If neither of these is set, return None.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html
Timestamp: 2020-10-19T01:56:49.518Z
Params: () Result: Optional[String] Get Spark's home location from either a value set through the constructor, or the spark.home Java property, or the SPARK_HOME environment variable (in that order of preference). If neither of these is set, return None. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html Timestamp: 2020-10-19T01:56:49.518Z
(spark-partition-id)
Params: ()
Result: Column
Partition ID.
1.6.0
This is non-deterministic because it depends on data partitioning and task scheduling.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.579Z
Params: () Result: Column Partition ID. 1.6.0 This is non-deterministic because it depends on data partitioning and task scheduling. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.579Z
(spark-session dataframe)
Params:
Result: SparkSession
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.951Z
Params: Result: SparkSession Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.951Z
Params: (size: Int, indices: Array[Int], values: Array[Double])
Result: Vector
Creates a sparse vector providing its index array and value array.
vector size.
index array, must be strictly increasing.
value array, must have the same length as indices.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/ml/linalg/Vectors$.html
Timestamp: 2020-10-19T01:56:35.350Z
Params: (size: Int, indices: Array[Int], values: Array[Double]) Result: Vector Creates a sparse vector providing its index array and value array. vector size. index array, must be strictly increasing. value array, must have the same length as indices. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/ml/linalg/Vectors$.html Timestamp: 2020-10-19T01:56:35.350Z
(split expr pattern)
Params: (str: Column, pattern: String)
Result: Column
Splits str around matches of the given pattern.
a string expression to split
a string representing a regular expression. The regex string should be a Java regular expression.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.582Z
Params: (str: Column, pattern: String) Result: Column Splits str around matches of the given pattern. a string expression to split a string representing a regular expression. The regex string should be a Java regular expression. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.582Z
(sql spark sql-text)
Executes a SQL query using Spark, returning the result as a DataFrame
.
The dialect that is used for SQL parsing can be configured with 'spark.sql.dialect'.
(g/sql spark "SELECT * FROM my_table")
Executes a SQL query using Spark, returning the result as a `DataFrame`. The dialect that is used for SQL parsing can be configured with 'spark.sql.dialect'. ```clojure (g/sql spark "SELECT * FROM my_table") ```
(sql-context dataframe)
Params:
Result: SQLContext
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.952Z
Params: Result: SQLContext Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.952Z
(sqr expr)
Returns the value of the first argument raised to the power of two.
Returns the value of the first argument raised to the power of two.
(sqrt expr)
Params: (e: Column)
Result: Column
Computes the square root of the specified float value.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.584Z
Params: (e: Column) Result: Column Computes the square root of the specified float value. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.584Z
(starts-with expr literal)
Params: (other: Column)
Result: Column
String starts with. Returns a boolean column based on a string match.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.979Z
Params: (other: Column) Result: Column String starts with. Returns a boolean column based on a string match. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.979Z
(std expr)
Params: (e: Column)
Result: Column
Aggregate function: alias for stddev_samp.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.586Z
Params: (e: Column) Result: Column Aggregate function: alias for stddev_samp. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.586Z
(stddev expr)
Params: (e: Column)
Result: Column
Aggregate function: alias for stddev_samp.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.586Z
Params: (e: Column) Result: Column Aggregate function: alias for stddev_samp. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.586Z
(stddev-pop expr)
Params: (e: Column)
Result: Column
Aggregate function: returns the population standard deviation of the expression in a group.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.593Z
Params: (e: Column) Result: Column Aggregate function: returns the population standard deviation of the expression in a group. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.593Z
(stddev-samp expr)
Params: (e: Column)
Result: Column
Aggregate function: alias for stddev_samp.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.586Z
Params: (e: Column) Result: Column Aggregate function: alias for stddev_samp. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.586Z
(storage-level dataframe)
Params:
Result: StorageLevel
Get the Dataset's current storage level, or StorageLevel.NONE if not persisted.
2.1.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.954Z
Params: Result: StorageLevel Get the Dataset's current storage level, or StorageLevel.NONE if not persisted. 2.1.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.954Z
(streaming? dataframe)
Params:
Result: Boolean
Returns true if this Dataset contains one or more sources that continuously return data as it arrives. A Dataset that reads data from a streaming source must be executed as a StreamingQuery using the start() method in DataStreamWriter. Methods that return a single answer, e.g. count() or collect(), will throw an AnalysisException when there is a streaming source present.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.844Z
Params: Result: Boolean Returns true if this Dataset contains one or more sources that continuously return data as it arrives. A Dataset that reads data from a streaming source must be executed as a StreamingQuery using the start() method in DataStreamWriter. Methods that return a single answer, e.g. count() or collect(), will throw an AnalysisException when there is a streaming source present. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.844Z
(struct & exprs)
Params: (cols: Column*)
Result: Column
Creates a new struct column. If the input column is a column in a DataFrame, or a derived column expression that is named (i.e. aliased), its name would be retained as the StructField's name, otherwise, the newly generated StructField's name would be auto generated as col with a suffix index + 1, i.e. col1, col2, col3, ...
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.597Z
Params: (cols: Column*) Result: Column Creates a new struct column. If the input column is a column in a DataFrame, or a derived column expression that is named (i.e. aliased), its name would be retained as the StructField's name, otherwise, the newly generated StructField's name would be auto generated as col with a suffix index + 1, i.e. col1, col2, col3, ... 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.597Z
(struct-field col-name data-type nullable)
Creates a StructField by specifying the name col-name
, data type data-type
and whether values of this field can be null values nullable
.
Creates a StructField by specifying the name `col-name`, data type `data-type` and whether values of this field can be null values `nullable`.
(struct-type & fields)
Creates a StructType with the given list of StructFields fields
.
Creates a StructType with the given list of StructFields `fields`.
(substring expr pos len)
Params: (str: Column, pos: Int, len: Int)
Result: Column
Substring starts at pos and is of length len when str is String type or returns the slice of byte array that starts at pos in byte and is of length len when str is Binary type
1.5.0
The position is not zero based, but 1 based index.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.599Z
Params: (str: Column, pos: Int, len: Int) Result: Column Substring starts at pos and is of length len when str is String type or returns the slice of byte array that starts at pos in byte and is of length len when str is Binary type 1.5.0 The position is not zero based, but 1 based index. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.599Z
(substring-index expr delim cnt)
Params: (str: Column, delim: String, count: Int)
Result: Column
Returns the substring from string str before count occurrences of the delimiter delim. If count is positive, everything the left of the final delimiter (counting from left) is returned. If count is negative, every to the right of the final delimiter (counting from the right) is returned. substring_index performs a case-sensitive match when searching for delim.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.600Z
Params: (str: Column, delim: String, count: Int) Result: Column Returns the substring from string str before count occurrences of the delimiter delim. If count is positive, everything the left of the final delimiter (counting from left) is returned. If count is negative, every to the right of the final delimiter (counting from the right) is returned. substring_index performs a case-sensitive match when searching for delim. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.600Z
Column: Aggregate function: returns the sum of all values in the given column.
RelationalGroupedDataset: Compute the sum for each numeric columns for each group.
Column: Aggregate function: returns the sum of all values in the given column. RelationalGroupedDataset: Compute the sum for each numeric columns for each group.
(sum-distinct expr)
Params: (e: Column)
Result: Column
Aggregate function: returns the sum of distinct values in the expression.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.604Z
Params: (e: Column) Result: Column Aggregate function: returns the sum of distinct values in the expression. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.604Z
(summary dataframe & stat-names)
Params: (statistics: String*)
Result: DataFrame
Computes specified statistics for numeric and string columns. Available statistics are:
If no statistics are given, this function computes count, mean, stddev, min, approximate quartiles (percentiles at 25%, 50%, and 75%), and max.
This function is meant for exploratory data analysis, as we make no guarantee about the backward compatibility of the schema of the resulting Dataset. If you want to programmatically compute summary statistics, use the agg function instead.
To do a summary for specific columns first select them:
See also describe for basic statistics.
Statistics from above list to be computed.
2.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.957Z
Params: (statistics: String*) Result: DataFrame Computes specified statistics for numeric and string columns. Available statistics are: If no statistics are given, this function computes count, mean, stddev, min, approximate quartiles (percentiles at 25%, 50%, and 75%), and max. This function is meant for exploratory data analysis, as we make no guarantee about the backward compatibility of the schema of the resulting Dataset. If you want to programmatically compute summary statistics, use the agg function instead. To do a summary for specific columns first select them: See also describe for basic statistics. Statistics from above list to be computed. 2.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.957Z
(table->dataset table col-names)
(table->dataset spark table col-names)
Construct a Dataset from a collection of collections.
(g/show (g/table->dataset [[1 2] [3 4]] [:a :b]))
; +---+---+
; |a |b |
; +---+---+
; |1 |2 |
; |3 |4 |
; +---+---+
Construct a Dataset from a collection of collections. ```clojure (g/show (g/table->dataset [[1 2] [3 4]] [:a :b])) ; +---+---+ ; |a |b | ; +---+---+ ; |1 |2 | ; |3 |4 | ; +---+---+ ```
(tail dataframe n-rows)
Params: (n: Int)
Result: Array[T]
Returns the last n rows in the Dataset.
Running tail requires moving data into the application's driver process, and doing so with a very large n can crash the driver process with OutOfMemoryError.
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.959Z
Params: (n: Int) Result: Array[T] Returns the last n rows in the Dataset. Running tail requires moving data into the application's driver process, and doing so with a very large n can crash the driver process with OutOfMemoryError. 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.959Z
(tail-vals dataframe n-rows)
Returns the vector values of the last n rows in the Dataset collected.
Returns the vector values of the last n rows in the Dataset collected.
(take dataframe n-rows)
Params: (n: Int)
Result: Array[T]
Returns the first n rows in the Dataset.
Running take requires moving data into the application's driver process, and doing so with a very large n can crash the driver process with OutOfMemoryError.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.961Z
Params: (n: Int) Result: Array[T] Returns the first n rows in the Dataset. Running take requires moving data into the application's driver process, and doing so with a very large n can crash the driver process with OutOfMemoryError. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.961Z
(take-vals dataframe n-rows)
Returns the vector values of the first n rows in the Dataset collected.
Returns the vector values of the first n rows in the Dataset collected.
(tan expr)
Params: (e: Column)
Result: Column
angle in radians
tangent of the given value, as if computed by java.lang.Math.tan
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.607Z
Params: (e: Column) Result: Column angle in radians tangent of the given value, as if computed by java.lang.Math.tan 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.607Z
(tanh expr)
Params: (e: Column)
Result: Column
hyperbolic angle
hyperbolic tangent of the given value, as if computed by java.lang.Math.tanh
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.610Z
Params: (e: Column) Result: Column hyperbolic angle hyperbolic tangent of the given value, as if computed by java.lang.Math.tanh 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.610Z
(time-window time-expr duration)
(time-window time-expr duration slide)
(time-window time-expr duration slide start)
Params: (timeColumn: Column, windowDuration: String, slideDuration: String, startTime: String)
Result: Column
Bucketize rows into one or more time windows given a timestamp specifying column. Window starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window [12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in the order of months are not supported. The following example takes the average stock price for a one minute window every 10 seconds starting 5 seconds after the hour:
The windows will look like:
For a streaming query, you may use the function current_timestamp to generate windows on processing time.
The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType.
A string specifying the width of the window, e.g. 10 minutes, 1 second. Check org.apache.spark.unsafe.types.CalendarInterval for valid duration identifiers. Note that the duration is a fixed length of time, and does not vary over time according to a calendar. For example, 1 day always means 86,400,000 milliseconds, not a calendar day.
A string specifying the sliding interval of the window, e.g. 1 minute. A new window will be generated every slideDuration. Must be less than or equal to the windowDuration. Check org.apache.spark.unsafe.types.CalendarInterval for valid duration identifiers. This duration is likewise absolute, and does not vary according to a calendar.
The offset with respect to 1970-01-01 00:00:00 UTC with which to start window intervals. For example, in order to have hourly tumbling windows that start 15 minutes past the hour, e.g. 12:15-13:15, 13:15-14:15... provide startTime as 15 minutes.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.732Z
Params: (timeColumn: Column, windowDuration: String, slideDuration: String, startTime: String) Result: Column Bucketize rows into one or more time windows given a timestamp specifying column. Window starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window [12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in the order of months are not supported. The following example takes the average stock price for a one minute window every 10 seconds starting 5 seconds after the hour: The windows will look like: For a streaming query, you may use the function current_timestamp to generate windows on processing time. The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType. A string specifying the width of the window, e.g. 10 minutes, 1 second. Check org.apache.spark.unsafe.types.CalendarInterval for valid duration identifiers. Note that the duration is a fixed length of time, and does not vary over time according to a calendar. For example, 1 day always means 86,400,000 milliseconds, not a calendar day. A string specifying the sliding interval of the window, e.g. 1 minute. A new window will be generated every slideDuration. Must be less than or equal to the windowDuration. Check org.apache.spark.unsafe.types.CalendarInterval for valid duration identifiers. This duration is likewise absolute, and does not vary according to a calendar. The offset with respect to 1970-01-01 00:00:00 UTC with which to start window intervals. For example, in order to have hourly tumbling windows that start 15 minutes past the hour, e.g. 12:15-13:15, 13:15-14:15... provide startTime as 15 minutes. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.732Z
(to-byte-array cms)
Params: ()
Result: Array[Byte]
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/CountMinSketch.html
Timestamp: 2020-10-19T01:56:26.107Z
Params: () Result: Array[Byte] Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/CountMinSketch.html Timestamp: 2020-10-19T01:56:26.107Z
(to-csv expr)
(to-csv expr options)
Params: (e: Column, options: Map[String, String])
Result: Column
(Java-specific) Converts a column containing a StructType into a CSV string with the specified schema. Throws an exception, in the case of an unsupported type.
a column containing a struct.
options to control how the struct column is converted into a CSV string. It accepts the same options and the json data source.
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.613Z
Params: (e: Column, options: Map[String, String]) Result: Column (Java-specific) Converts a column containing a StructType into a CSV string with the specified schema. Throws an exception, in the case of an unsupported type. a column containing a struct. options to control how the struct column is converted into a CSV string. It accepts the same options and the json data source. 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.613Z
(to-date expr)
(to-date expr date-format)
Params: (e: Column)
Result: Column
Converts the column into DateType by casting rules to DateType.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.616Z
Params: (e: Column) Result: Column Converts the column into DateType by casting rules to DateType. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.616Z
Coerce to string useful for debugging.
Coerce to string useful for debugging.
Collection: alias for table->dataset
.
Dataset: Converts this strongly typed collection of data to generic DataFrame with columns renamed.
Collection: alias for `table->dataset`. Dataset: Converts this strongly typed collection of data to generic DataFrame with columns renamed.
Column: Converts a column containing a StructType, ArrayType or a MapType into a JSON string with the specified schema.
Dataset: Returns the content of the Dataset as a Dataset of JSON strings.
Column: Converts a column containing a StructType, ArrayType or a MapType into a JSON string with the specified schema. Dataset: Returns the content of the Dataset as a Dataset of JSON strings.
(to-timestamp expr)
(to-timestamp expr date-format)
Params: (s: Column)
Result: Column
Converts to a timestamp by casting rules to TimestampType.
A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
A timestamp, or null if the input was a string that could not be cast to a timestamp
2.2.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.623Z
Params: (s: Column) Result: Column Converts to a timestamp by casting rules to TimestampType. A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS A timestamp, or null if the input was a string that could not be cast to a timestamp 2.2.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.623Z
(to-utc-timestamp expr)
Params: (ts: Column, tz: String)
Result: Column
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'.
A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
A string detailing the time zone ID that the input should be adjusted to. It should be in the format of either region-based zone IDs or zone offsets. Region IDs must have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'. Other short names are not recommended to use because they can be ambiguous.
A timestamp, or null if ts was a string that could not be cast to a timestamp or tz was an invalid value
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.626Z
Params: (ts: Column, tz: String) Result: Column Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'. A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS A string detailing the time zone ID that the input should be adjusted to. It should be in the format of either region-based zone IDs or zone offsets. Region IDs must have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'. Other short names are not recommended to use because they can be ambiguous. A timestamp, or null if ts was a string that could not be cast to a timestamp or tz was an invalid value 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.626Z
(total-count cms)
Params: ()
Result: Long
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/CountMinSketch.html
Timestamp: 2020-10-19T01:56:26.108Z
Params: () Result: Long Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/CountMinSketch.html Timestamp: 2020-10-19T01:56:26.108Z
(transform expr xform-fn)
Params: (column: Column, f: (Column) ⇒ Column)
Result: Column
Returns an array of elements after applying a transformation to each element in the input array.
the input array column
col => transformed_col, the lambda function to transform the input column
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.629Z
Params: (column: Column, f: (Column) ⇒ Column) Result: Column Returns an array of elements after applying a transformation to each element in the input array. the input array column col => transformed_col, the lambda function to transform the input column 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.629Z
(transform-keys expr key-fn)
Params: (expr: Column, f: (Column, Column) ⇒ Column)
Result: Column
Applies a function to every key-value pair in a map and returns a map with the results of those applications as the new keys for the pairs.
the input map column
(key, value) => new_key, the lambda function to transform the key of input map column
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.630Z
Params: (expr: Column, f: (Column, Column) ⇒ Column) Result: Column Applies a function to every key-value pair in a map and returns a map with the results of those applications as the new keys for the pairs. the input map column (key, value) => new_key, the lambda function to transform the key of input map column 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.630Z
(transform-values expr key-fn)
Params: (expr: Column, f: (Column, Column) ⇒ Column)
Result: Column
Applies a function to every key-value pair in a map and returns a map with the results of those applications as the new values for the pairs.
the input map column
(key, value) => new_value, the lambda function to transform the value of input map column
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.638Z
Params: (expr: Column, f: (Column, Column) ⇒ Column) Result: Column Applies a function to every key-value pair in a map and returns a map with the results of those applications as the new values for the pairs. the input map column (key, value) => new_value, the lambda function to transform the value of input map column 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.638Z
(translate expr match replacement)
Params: (src: Column, matchingString: String, replaceString: String)
Result: Column
Translate any character in the src by a character in replaceString. The characters in replaceString correspond to the characters in matchingString. The translate will happen when any character in the string matches the character in the matchingString.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.639Z
Params: (src: Column, matchingString: String, replaceString: String) Result: Column Translate any character in the src by a character in replaceString. The characters in replaceString correspond to the characters in matchingString. The translate will happen when any character in the string matches the character in the matchingString. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.639Z
(trim expr trim-string)
Params: (e: Column)
Result: Column
Trim the spaces from both ends for the specified string column.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.641Z
Params: (e: Column) Result: Column Trim the spaces from both ends for the specified string column. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.641Z
(unbase-64 expr)
Params: (e: Column)
Result: Column
Decodes a BASE64 encoded string column and returns it as a binary column. This is the reverse of base64.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.702Z
Params: (e: Column) Result: Column Decodes a BASE64 encoded string column and returns it as a binary column. This is the reverse of base64. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.702Z
(unbase64 expr)
Params: (e: Column)
Result: Column
Decodes a BASE64 encoded string column and returns it as a binary column. This is the reverse of base64.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.702Z
Params: (e: Column) Result: Column Decodes a BASE64 encoded string column and returns it as a binary column. This is the reverse of base64. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.702Z
Params:
Result: Long
Value representing the last row in the partition, equivalent to "UNBOUNDED FOLLOWING" in SQL. This can be used to specify the frame boundaries:
2.1.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/expressions/Window$.html
Timestamp: 2020-10-19T01:56:25.054Z
Params: Result: Long Value representing the last row in the partition, equivalent to "UNBOUNDED FOLLOWING" in SQL. This can be used to specify the frame boundaries: 2.1.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/expressions/Window$.html Timestamp: 2020-10-19T01:56:25.054Z
Params:
Result: Long
Value representing the first row in the partition, equivalent to "UNBOUNDED PRECEDING" in SQL. This can be used to specify the frame boundaries:
2.1.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/expressions/Window$.html
Timestamp: 2020-10-19T01:56:25.055Z
Params: Result: Long Value representing the first row in the partition, equivalent to "UNBOUNDED PRECEDING" in SQL. This can be used to specify the frame boundaries: 2.1.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/expressions/Window$.html Timestamp: 2020-10-19T01:56:25.055Z
(unhex expr)
Params: (column: Column)
Result: Column
Inverse of hex. Interprets each pair of characters as a hexadecimal number and converts to the byte representation of number.
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.703Z
Params: (column: Column) Result: Column Inverse of hex. Interprets each pair of characters as a hexadecimal number and converts to the byte representation of number. 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.703Z
(union & dataframes)
Params: (other: Dataset[T])
Result: Dataset[T]
Returns a new Dataset containing union of rows in this Dataset and another Dataset.
This is equivalent to UNION ALL in SQL. To do a SQL-style set union (that does deduplication of elements), use this function followed by a distinct.
Also as standard in SQL, this function resolves columns by position (not by name):
Notice that the column positions in the schema aren't necessarily matched with the fields in the strongly typed objects in a Dataset. This function resolves columns by their positions in the schema, not the fields in the strongly typed objects. Use unionByName to resolve columns by field name in the typed objects.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.974Z
Params: (other: Dataset[T]) Result: Dataset[T] Returns a new Dataset containing union of rows in this Dataset and another Dataset. This is equivalent to UNION ALL in SQL. To do a SQL-style set union (that does deduplication of elements), use this function followed by a distinct. Also as standard in SQL, this function resolves columns by position (not by name): Notice that the column positions in the schema aren't necessarily matched with the fields in the strongly typed objects in a Dataset. This function resolves columns by their positions in the schema, not the fields in the strongly typed objects. Use unionByName to resolve columns by field name in the typed objects. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.974Z
(union-by-name & dataframes)
Params: (other: Dataset[T])
Result: Dataset[T]
Returns a new Dataset containing union of rows in this Dataset and another Dataset.
This is different from both UNION ALL and UNION DISTINCT in SQL. To do a SQL-style set union (that does deduplication of elements), use this function followed by a distinct.
The difference between this function and union is that this function resolves columns by name (not by position):
2.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.978Z
Params: (other: Dataset[T]) Result: Dataset[T] Returns a new Dataset containing union of rows in this Dataset and another Dataset. This is different from both UNION ALL and UNION DISTINCT in SQL. To do a SQL-style set union (that does deduplication of elements), use this function followed by a distinct. The difference between this function and union is that this function resolves columns by name (not by position): 2.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.978Z
(unix-timestamp)
(unix-timestamp expr)
(unix-timestamp expr pattern)
Params: ()
Result: Column
Returns the current Unix timestamp (in seconds) as a long.
1.5.0
All calls of unix_timestamp within the same query return the same value (i.e. the current timestamp is calculated at the start of query evaluation).
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.710Z
Params: () Result: Column Returns the current Unix timestamp (in seconds) as a long. 1.5.0 All calls of unix_timestamp within the same query return the same value (i.e. the current timestamp is calculated at the start of query evaluation). Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.710Z
(unpersist dataframe)
(unpersist dataframe blocking)
Params: (blocking: Boolean)
Result: Dataset.this.type
Mark the Dataset as non-persistent, and remove all blocks for it from memory and disk. This will not un-persist any cached data that is built upon this Dataset.
Whether to block until all blocks are deleted.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.980Z
Params: (blocking: Boolean) Result: Dataset.this.type Mark the Dataset as non-persistent, and remove all blocks for it from memory and disk. This will not un-persist any cached data that is built upon this Dataset. Whether to block until all blocks are deleted. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.980Z
Column: transform-values
with Clojure's assoc
signature.
Dataset: with-column
with Clojure's assoc
signature.
Column: `transform-values` with Clojure's `assoc` signature. Dataset: `with-column` with Clojure's `assoc` signature.
(upper expr)
Params: (e: Column)
Result: Column
Converts a string column to upper case.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.712Z
Params: (e: Column) Result: Column Converts a string column to upper case. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.712Z
(vals expr)
Params: (e: Column)
Result: Column
Returns an unordered array containing the values of the map.
2.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.473Z
Params: (e: Column) Result: Column Returns an unordered array containing the values of the map. 2.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.473Z
(value-counts dataframe)
Returns a Dataset containing counts of unique rows.
The resulting object will be in descending order so that the first element is the most frequently-occurring element.
Returns a Dataset containing counts of unique rows. The resulting object will be in descending order so that the first element is the most frequently-occurring element.
(var-pop expr)
Params: (e: Column)
Result: Column
Aggregate function: returns the population variance of the values in a group.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.714Z
Params: (e: Column) Result: Column Aggregate function: returns the population variance of the values in a group. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.714Z
(var-samp expr)
Params: (e: Column)
Result: Column
Aggregate function: alias for var_samp.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.718Z
Params: (e: Column) Result: Column Aggregate function: alias for var_samp. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.718Z
(variance expr)
Params: (e: Column)
Result: Column
Aggregate function: alias for var_samp.
1.6.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.718Z
Params: (e: Column) Result: Column Aggregate function: alias for var_samp. 1.6.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.718Z
(version)
(version spark)
Params:
Result: String
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html
Timestamp: 2020-10-19T01:56:49.576Z
Params: Result: String Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html Timestamp: 2020-10-19T01:56:49.576Z
(week-of-year expr)
Params: (e: Column)
Result: Column
Extracts the week number as an integer from a given date/timestamp/string.
A week is considered to start on a Monday and week 1 is the first week with more than 3 days, as defined by ISO 8601
An integer, or null if the input was a string that could not be cast to a date
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.723Z
Params: (e: Column) Result: Column Extracts the week number as an integer from a given date/timestamp/string. A week is considered to start on a Monday and week 1 is the first week with more than 3 days, as defined by ISO 8601 An integer, or null if the input was a string that could not be cast to a date 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.723Z
(weekofyear expr)
Params: (e: Column)
Result: Column
Extracts the week number as an integer from a given date/timestamp/string.
A week is considered to start on a Monday and week 1 is the first week with more than 3 days, as defined by ISO 8601
An integer, or null if the input was a string that could not be cast to a date
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.723Z
Params: (e: Column) Result: Column Extracts the week number as an integer from a given date/timestamp/string. A week is considered to start on a Monday and week 1 is the first week with more than 3 days, as defined by ISO 8601 An integer, or null if the input was a string that could not be cast to a date 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.723Z
(when condition if-expr)
(when condition if-expr else-expr)
Params: (condition: Column, value: Any)
Result: Column
Evaluates a list of conditions and returns one of multiple possible result expressions. If otherwise is not defined at the end, null is returned for unmatched conditions.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.724Z
Params: (condition: Column, value: Any) Result: Column Evaluates a list of conditions and returns one of multiple possible result expressions. If otherwise is not defined at the end, null is returned for unmatched conditions. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.724Z
Column: Returns an array of elements for which a predicate holds in a given array.
Dataset: Filters rows using the given condition.
Column: Returns an array of elements for which a predicate holds in a given array. Dataset: Filters rows using the given condition.
(width cms)
Params: ()
Result: Int
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/CountMinSketch.html
Timestamp: 2020-10-19T01:56:26.108Z
Params: () Result: Int Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/util/sketch/CountMinSketch.html Timestamp: 2020-10-19T01:56:26.108Z
(window {:keys [partition-by order-by range-between rows-between]})
Utility functions for defining window in DataFrames.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/expressions/Window$.html
Timestamp: 2020-10-19T01:55:47.755Z
Utility functions for defining window in DataFrames. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/expressions/Window$.html Timestamp: 2020-10-19T01:55:47.755Z
(windowed options)
Shortcut to create WindowSpec that takes a map as the argument.
Expected keys: [:partition-by :order-by :range-between :rows-between]
Shortcut to create WindowSpec that takes a map as the argument. Expected keys: [:partition-by :order-by :range-between :rows-between]
(with-column dataframe col-name expr)
Params: (colName: String, col: Column)
Result: DataFrame
Returns a new Dataset by adding a column or replacing the existing column that has the same name.
column's expression must only refer to attributes supplied by this Dataset. It is an error to add a column that refers to some other Dataset.
2.0.0
this method introduces a projection internally. Therefore, calling it multiple times, for instance, via loops in order to add multiple columns can generate big plans which can cause performance issues and even StackOverflowException. To avoid this, use select with the multiple columns at once.
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.987Z
Params: (colName: String, col: Column) Result: DataFrame Returns a new Dataset by adding a column or replacing the existing column that has the same name. column's expression must only refer to attributes supplied by this Dataset. It is an error to add a column that refers to some other Dataset. 2.0.0 this method introduces a projection internally. Therefore, calling it multiple times, for instance, via loops in order to add multiple columns can generate big plans which can cause performance issues and even StackOverflowException. To avoid this, use select with the multiple columns at once. Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.987Z
(with-column-renamed dataframe old-name new-name)
Params: (existingName: String, newName: String)
Result: DataFrame
Returns a new Dataset with a column renamed. This is a no-op if schema doesn't contain existingName.
2.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html
Timestamp: 2020-10-19T01:56:20.988Z
Params: (existingName: String, newName: String) Result: DataFrame Returns a new Dataset with a column renamed. This is a no-op if schema doesn't contain existingName. 2.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Dataset.html Timestamp: 2020-10-19T01:56:20.988Z
(write-avro! dataframe path)
(write-avro! dataframe path options)
Writes an Avro file at the specified path.
Spark's DataFrameWriter options may be passed in as a map of options.
See: https://spark.apache.org/docs/latest/sql-data-sources.html
Writes an Avro file at the specified path. Spark's DataFrameWriter options may be passed in as a map of options. See: https://spark.apache.org/docs/latest/sql-data-sources.html
(write-csv! dataframe path)
(write-csv! dataframe path options)
Writes a CSV file at the specified path.
Spark's DataFrameWriter options may be passed in as a map of options.
See: https://spark.apache.org/docs/latest/sql-data-sources.html
Writes a CSV file at the specified path. Spark's DataFrameWriter options may be passed in as a map of options. See: https://spark.apache.org/docs/latest/sql-data-sources.html
(write-edn! dataframe path)
(write-edn! dataframe path options)
Writes an EDN file at the specified path.
Writes an EDN file at the specified path.
(write-jdbc! dataframe options)
Writes a database table.
Spark's DataFrameWriter options may be passed in as a map of options.
See: https://spark.apache.org/docs/latest/sql-data-sources.html
Writes a database table. Spark's DataFrameWriter options may be passed in as a map of options. See: https://spark.apache.org/docs/latest/sql-data-sources.html
(write-json! dataframe path)
(write-json! dataframe path options)
Writes a JSON file at the specified path.
Spark's DataFrameWriter options may be passed in as a map of options.
See: https://spark.apache.org/docs/latest/sql-data-sources-json.html
Writes a JSON file at the specified path. Spark's DataFrameWriter options may be passed in as a map of options. See: https://spark.apache.org/docs/latest/sql-data-sources-json.html
(write-libsvm! dataframe path)
(write-libsvm! dataframe path options)
Writes a LIBSVM file at the specified path.
Spark's DataFrameWriter options may be passed in as a map of options.
See: https://spark.apache.org/docs/latest/sql-data-sources.html
Writes a LIBSVM file at the specified path. Spark's DataFrameWriter options may be passed in as a map of options. See: https://spark.apache.org/docs/latest/sql-data-sources.html
(write-parquet! dataframe path)
(write-parquet! dataframe path options)
Writes a Parquet file at the specified path.
Spark's DataFrameWriter options may be passed in as a map of options.
See: https://spark.apache.org/docs/latest/sql-data-sources-parquet.html
Writes a Parquet file at the specified path. Spark's DataFrameWriter options may be passed in as a map of options. See: https://spark.apache.org/docs/latest/sql-data-sources-parquet.html
(write-table! dataframe table-name)
(write-table! dataframe table-name options)
Writes the dataset to a managed (hive) table.
Writes the dataset to a managed (hive) table.
(write-text! dataframe path)
(write-text! dataframe path options)
Writes a text file at the specified path.
Spark's DataFrameWriter options may be passed in as a map of options.
See: https://spark.apache.org/docs/latest/sql-data-sources.html
Writes a text file at the specified path. Spark's DataFrameWriter options may be passed in as a map of options. See: https://spark.apache.org/docs/latest/sql-data-sources.html
(write-xlsx! dataframe path)
(write-xlsx! dataframe path options)
Writes an Excel file at the specified path.
Writes an Excel file at the specified path.
(xxhash-64 & exprs)
Params: (cols: Column*)
Result: Column
Calculates the hash code of given columns using the 64-bit variant of the xxHash algorithm, and returns the result as a long column.
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.733Z
Params: (cols: Column*) Result: Column Calculates the hash code of given columns using the 64-bit variant of the xxHash algorithm, and returns the result as a long column. 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.733Z
(xxhash64 & exprs)
Params: (cols: Column*)
Result: Column
Calculates the hash code of given columns using the 64-bit variant of the xxHash algorithm, and returns the result as a long column.
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.733Z
Params: (cols: Column*) Result: Column Calculates the hash code of given columns using the 64-bit variant of the xxHash algorithm, and returns the result as a long column. 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.733Z
(year expr)
Params: (e: Column)
Result: Column
Extracts the year as an integer from a given date/timestamp/string.
An integer, or null if the input was a string that could not be cast to a date
1.5.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.734Z
Params: (e: Column) Result: Column Extracts the year as an integer from a given date/timestamp/string. An integer, or null if the input was a string that could not be cast to a date 1.5.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.734Z
(zero? expr)
Returns true if expr
is zero, else false.
Returns true if `expr` is zero, else false.
(zip-with left right merge-fn)
Params: (left: Column, right: Column, f: (Column, Column) ⇒ Column)
Result: Column
Merge two given arrays, element-wise, into a single array using a function. If one array is shorter, nulls are appended at the end to match the length of the longer array, before applying the function.
the left input array column
the right input array column
(lCol, rCol) => col, the lambda function to merge two input columns into one column
3.0.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.737Z
Params: (left: Column, right: Column, f: (Column, Column) ⇒ Column) Result: Column Merge two given arrays, element-wise, into a single array using a function. If one array is shorter, nulls are appended at the end to match the length of the longer array, before applying the function. the left input array column the right input array column (lCol, rCol) => col, the lambda function to merge two input columns into one column 3.0.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.737Z
(zipmap key-expr val-expr)
Params: (keys: Column, values: Column)
Result: Column
Creates a new map column. The array in the first column is used for keys. The array in the second column is used for values. All elements in the array for key should not be null.
2.4
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html
Timestamp: 2020-10-19T01:56:22.470Z
Params: (keys: Column, values: Column) Result: Column Creates a new map column. The array in the first column is used for keys. The array in the second column is used for values. All elements in the array for key should not be null. 2.4 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/functions$.html Timestamp: 2020-10-19T01:56:22.470Z
(| left-expr right-expr)
Params: (other: Any)
Result: Column
Compute bitwise OR of this expression with another expression.
1.4.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.879Z
Params: (other: Any) Result: Column Compute bitwise OR of this expression with another expression. 1.4.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.879Z
(|| & exprs)
Params: (other: Any)
Result: Column
Boolean OR.
1.3.0
Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html
Timestamp: 2020-10-19T01:56:19.994Z
Params: (other: Any) Result: Column Boolean OR. 1.3.0 Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/sql/Column.html Timestamp: 2020-10-19T01:56:19.994Z
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close