Liking cljdoc? Tell your friends :D
Clojure only.

zero-one.geni.rdd


aggregateclj

(aggregate rdd zero seq-op comb-op)

Params: (zeroValue: U)

(seqOp: Function2[U, T, U], combOp: Function2[U, U, U])

Result: U

Aggregate the elements of each partition, and then the results for all the partitions, using given combine functions and a neutral "zero value". This function can return a different result type, U, than the type of this RDD, T. Thus, we need one operation for merging a T into an U and one operation for merging two U's, as in scala.TraversableOnce. Both of these functions are allowed to modify and return their first argument instead of creating a new U to avoid memory allocation.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.803Z

Params: (zeroValue: U)

(seqOp: Function2[U, T, U], combOp: Function2[U, U, U])

Result: U

Aggregate the elements of each partition, and then the results for all the partitions, using
given combine functions and a neutral "zero value". This function can return a different result
type, U, than the type of this RDD, T. Thus, we need one operation for merging a T into an U
and one operation for merging two U's, as in scala.TraversableOnce. Both of these functions are
allowed to modify and return their first argument instead of creating a new U to avoid memory
allocation.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.803Z
sourceraw docstring

aggregate-by-keyclj

(aggregate-by-key rdd zero seq-fn comb-fn)
(aggregate-by-key rdd zero num-partitions seq-fn comb-fn)

Params: (zeroValue: U, partitioner: Partitioner, seqFunc: Function2[U, V, U], combFunc: Function2[U, U, U])

Result: JavaPairRDD[K, U]

Aggregate the values of each key, using given combine functions and a neutral "zero value". This function can return a different result type, U, than the type of the values in this RDD, V. Thus, we need one operation for merging a V into a U and one operation for merging two U's, as in scala.TraversableOnce. The former operation is used for merging values within a partition, and the latter is used for merging values between partitions. To avoid memory allocation, both of these functions are allowed to modify and return their first argument instead of creating a new U.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.007Z

Params: (zeroValue: U, partitioner: Partitioner, seqFunc: Function2[U, V, U], combFunc: Function2[U, U, U])

Result: JavaPairRDD[K, U]

Aggregate the values of each key, using given combine functions and a neutral "zero value".
This function can return a different result type, U, than the type of the values in this RDD,
V. Thus, we need one operation for merging a V into a U and one operation for merging two U's,
as in scala.TraversableOnce. The former operation is used for merging values within a
partition, and the latter is used for merging values between partitions. To avoid memory
allocation, both of these functions are allowed to modify and return their first argument
instead of creating a new U.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.007Z
sourceraw docstring

app-nameclj

(app-name)
(app-name spark)

Params:

Result: String

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.487Z

Params: 

Result: String



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.487Z
sourceraw docstring

binary-filescljmultimethod

Params: (path: String, minPartitions: Int)

Result: JavaPairRDD[String, PortableDataStream]

Read a directory of binary files from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI as a byte array. Each file is read as a single record and returned in a key-value pair, where the key is the path of each file, the value is the content of each file.

For example, if you have the following files:

Do

then rdd contains

A suggestion value of the minimal splitting number for input data.

Small files are preferred; very large files but may cause bad performance.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.492Z

Params: (path: String, minPartitions: Int)

Result: JavaPairRDD[String, PortableDataStream]

Read a directory of binary files from HDFS, a local file system (available on all nodes),
or any Hadoop-supported file system URI as a byte array. Each file is read as a single
record and returned in a key-value pair, where the key is the path of each file,
the value is the content of each file.

For example, if you have the following files:

Do

then rdd contains

A suggestion value of the minimal splitting number for input data.

Small files are preferred; very large files but may cause bad performance.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.492Z
sourceraw docstring

broadcastclj

(broadcast value)
(broadcast spark value)

Params: (value: T)

Result: Broadcast[T]

Broadcast a read-only variable to the cluster, returning a org.apache.spark.broadcast.Broadcast object for reading it in distributed functions. The variable will be sent to each cluster only once.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.495Z

Params: (value: T)

Result: Broadcast[T]

Broadcast a read-only variable to the cluster, returning a
org.apache.spark.broadcast.Broadcast object for reading it in distributed functions.
The variable will be sent to each cluster only once.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.495Z
sourceraw docstring

cacheclj

(cache rdd)

Params: ()

Result: JavaRDD[T]

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.805Z

Params: ()

Result: JavaRDD[T]



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.805Z
sourceraw docstring

cartesianclj

(cartesian)
(cartesian rdd)
(cartesian left right)
(cartesian left right & rdds)

Params: (other: JavaRDDLike[U, _])

Result: JavaPairRDD[T, U]

Return the Cartesian product of this RDD and another one, that is, the RDD of all pairs of elements (a, b) where a is in this and b is in other.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.807Z

Params: (other: JavaRDDLike[U, _])

Result: JavaPairRDD[T, U]

Return the Cartesian product of this RDD and another one, that is, the RDD of all pairs of
elements (a, b) where a is in this and b is in other.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.807Z
sourceraw docstring

checkpoint-dirclj

(checkpoint-dir)
(checkpoint-dir spark)

Params:

Result: Optional[String]

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.509Z

Params: 

Result: Optional[String]



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.509Z
sourceraw docstring

checkpointed?clj

(checkpointed? rdd)

Params:

Result: Boolean

Return whether this RDD has been checkpointed or not

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.861Z

Params: 

Result: Boolean

Return whether this RDD has been checkpointed or not


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.861Z
sourceraw docstring

coalesceclj

(coalesce rdd num-partitions)
(coalesce rdd num-partitions shuffle)

Params: (numPartitions: Int)

Result: JavaRDD[T]

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.812Z

Params: (numPartitions: Int)

Result: JavaRDD[T]



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.812Z
sourceraw docstring

cogroupclj

(cogroup this other1)
(cogroup this other1 other2)
(cogroup this other1 other2 other3)

Params: (other: JavaPairRDD[K, W], partitioner: Partitioner)

Result: JavaPairRDD[K, (Iterable[V], Iterable[W])]

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.034Z

Params: (other: JavaPairRDD[K, W], partitioner: Partitioner)

Result: JavaPairRDD[K, (Iterable[V], Iterable[W])]



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.034Z
sourceraw docstring

collectclj

(collect rdd)

Params: ()

Result: List[T]

Return an array that contains all of the elements in this RDD.

this method should only be used if the resulting array is expected to be small, as all the data is loaded into the driver's memory.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.813Z

Params: ()

Result: List[T]

Return an array that contains all of the elements in this RDD.


this method should only be used if the resulting array is expected to be small, as
all the data is loaded into the driver's memory.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.813Z
sourceraw docstring

collect-asyncclj

(collect-async rdd)

Params: ()

Result: JavaFutureAction[List[T]]

The asynchronous version of collect, which returns a future for retrieving an array containing all of the elements in this RDD.

this method should only be used if the resulting array is expected to be small, as all the data is loaded into the driver's memory.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.814Z

Params: ()

Result: JavaFutureAction[List[T]]

The asynchronous version of collect, which returns a future for
retrieving an array containing all of the elements in this RDD.


this method should only be used if the resulting array is expected to be small, as
all the data is loaded into the driver's memory.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.814Z
sourceraw docstring

collect-partitionsclj

(collect-partitions rdd partition-ids)

Params: (partitionIds: Array[Int])

Result: Array[List[T]]

Return an array that contains all of the elements in a specific partition of this RDD.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.816Z

Params: (partitionIds: Array[Int])

Result: Array[List[T]]

Return an array that contains all of the elements in a specific partition of this RDD.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.816Z
sourceraw docstring

combine-by-keyclj

(combine-by-key rdd create-fn merge-value-fn merge-combiner-fn)
(combine-by-key rdd
                create-fn
                merge-value-fn
                merge-combiner-fn
                partitions-or-partitioner)

Params: (createCombiner: Function[V, C], mergeValue: Function2[C, V, C], mergeCombiners: Function2[C, C, C], partitioner: Partitioner, mapSideCombine: Boolean, serializer: Serializer)

Result: JavaPairRDD[K, C]

Generic function to combine the elements for each key using a custom set of aggregation functions. Turns a JavaPairRDD[(K, V)] into a result of type JavaPairRDD[(K, C)], for a "combined type" C.

Users provide three functions:

In addition, users can control the partitioning of the output RDD, the serializer that is use for the shuffle, and whether to perform map-side aggregation (if a mapper can produce multiple items with the same key).

V and C can be different -- for example, one might group an RDD of type (Int, Int) into an RDD of type (Int, List[Int]).

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.051Z

Params: (createCombiner: Function[V, C], mergeValue: Function2[C, V, C], mergeCombiners: Function2[C, C, C], partitioner: Partitioner, mapSideCombine: Boolean, serializer: Serializer)

Result: JavaPairRDD[K, C]

Generic function to combine the elements for each key using a custom set of aggregation
functions. Turns a JavaPairRDD[(K, V)] into a result of type JavaPairRDD[(K, C)], for a
"combined type" C.

Users provide three functions:

In addition, users can control the partitioning of the output RDD, the serializer that is use
for the shuffle, and whether to perform map-side aggregation (if a mapper can produce multiple
items with the same key).


V and C can be different -- for example, one might group an RDD of type (Int, Int) into
an RDD of type (Int, List[Int]).

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.051Z
sourceraw docstring

confclj

(conf)
(conf spark)

Params:

Result: SparkConf

Return a copy of this JavaSparkContext's configuration. The configuration cannot be changed at runtime.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.511Z

Params: 

Result: SparkConf

Return a copy of this JavaSparkContext's configuration. The configuration cannot be
changed at runtime.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.511Z
sourceraw docstring

contextclj

(context rdd)

Params:

Result: SparkContext

The org.apache.spark.SparkContext that this RDD was created on.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.817Z

Params: 

Result: SparkContext

The org.apache.spark.SparkContext that this RDD was created on.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.817Z
sourceraw docstring

countclj

(count rdd)

Params: ()

Result: Long

Return the number of elements in the RDD.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.818Z

Params: ()

Result: Long

Return the number of elements in the RDD.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.818Z
sourceraw docstring

count-approxclj

(count-approx rdd timeout)
(count-approx rdd timeout confidence)

Params: (timeout: Long, confidence: Double)

Result: PartialResult[BoundedDouble]

Approximate version of count() that returns a potentially incomplete result within a timeout, even if not all tasks have finished.

The confidence is the probability that the error bounds of the result will contain the true value. That is, if countApprox were called repeatedly with confidence 0.9, we would expect 90% of the results to contain the true count. The confidence must be in the range [0,1] or an exception will be thrown.

maximum time to wait for the job, in milliseconds

the desired statistical confidence in the result

a potentially incomplete result, with error bounds

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.820Z

Params: (timeout: Long, confidence: Double)

Result: PartialResult[BoundedDouble]

Approximate version of count() that returns a potentially incomplete result
within a timeout, even if not all tasks have finished.

The confidence is the probability that the error bounds of the result will
contain the true value. That is, if countApprox were called repeatedly
with confidence 0.9, we would expect 90% of the results to contain the
true count. The confidence must be in the range [0,1] or an exception will
be thrown.


maximum time to wait for the job, in milliseconds

the desired statistical confidence in the result

a potentially incomplete result, with error bounds

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.820Z
sourceraw docstring

count-approx-distinctclj

(count-approx-distinct rdd relative-sd)

Params: (relativeSD: Double)

Result: Long

Return approximate number of distinct elements in the RDD.

The algorithm used is based on streamlib's implementation of "HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm", available here.

Relative accuracy. Smaller values create counters that require more space. It must be greater than 0.000017.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.822Z

Params: (relativeSD: Double)

Result: Long

Return approximate number of distinct elements in the RDD.

The algorithm used is based on streamlib's implementation of "HyperLogLog in Practice:
Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm", available
here.


Relative accuracy. Smaller values create counters that require more space.
                  It must be greater than 0.000017.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.822Z
sourceraw docstring

count-approx-distinct-by-keyclj

(count-approx-distinct-by-key rdd relative-sd)
(count-approx-distinct-by-key rdd relative-sd partitions-or-partitioner)

Params: (relativeSD: Double, partitioner: Partitioner)

Result: JavaPairRDD[K, Long]

Return approximate number of distinct values for each key in this RDD.

The algorithm used is based on streamlib's implementation of "HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm", available here.

Relative accuracy. Smaller values create counters that require more space. It must be greater than 0.000017.

partitioner of the resulting RDD.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.061Z

Params: (relativeSD: Double, partitioner: Partitioner)

Result: JavaPairRDD[K, Long]

Return approximate number of distinct values for each key in this RDD.

The algorithm used is based on streamlib's implementation of "HyperLogLog in Practice:
Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm", available
here.


Relative accuracy. Smaller values create counters that require more space.
                  It must be greater than 0.000017.

partitioner of the resulting RDD.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.061Z
sourceraw docstring

count-asyncclj

(count-async rdd)

Params: ()

Result: JavaFutureAction[Long]

The asynchronous version of count, which returns a future for counting the number of elements in this RDD.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.823Z

Params: ()

Result: JavaFutureAction[Long]

The asynchronous version of count, which returns a
future for counting the number of elements in this RDD.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.823Z
sourceraw docstring

count-by-keyclj

(count-by-key rdd)

Params: ()

Result: Map[K, Long]

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.063Z

Params: ()

Result: Map[K, Long]



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.063Z
sourceraw docstring

count-by-key-approxclj

(count-by-key-approx rdd timeout)
(count-by-key-approx rdd timeout confidence)

Params: (timeout: Long)

Result: PartialResult[Map[K, BoundedDouble]]

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.065Z

Params: (timeout: Long)

Result: PartialResult[Map[K, BoundedDouble]]



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.065Z
sourceraw docstring

count-by-valueclj

(count-by-value rdd)

Params: ()

Result: Map[T, Long]

Return the count of each unique value in this RDD as a map of (value, count) pairs. The final combine step happens locally on the master, equivalent to running a single reduce task.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.824Z

Params: ()

Result: Map[T, Long]

Return the count of each unique value in this RDD as a map of (value, count) pairs. The final
combine step happens locally on the master, equivalent to running a single reduce task.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.824Z
sourceraw docstring

default-min-partitionsclj

(default-min-partitions)
(default-min-partitions spark)

Params:

Result: Integer

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.503Z

Params: 

Result: Integer



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.503Z
sourceraw docstring

default-parallelismclj

(default-parallelism)
(default-parallelism spark)

Params:

Result: Integer

Default level of parallelism to use when not given by user (e.g. parallelize and makeRDD).

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.504Z

Params: 

Result: Integer

Default level of parallelism to use when not given by user (e.g. parallelize and makeRDD).

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.504Z
sourceraw docstring

disk-onlyclj

Flag for controlling the storage of an RDD.

DataFrame is stored only on disk and the CPU computation time is high as I/O involved.

Flag for controlling the storage of an RDD.

DataFrame is stored only on disk and the CPU computation time is high as I/O involved.
sourceraw docstring

disk-only-2clj

Flag for controlling the storage of an RDD.

Same as disk-only storage level but replicate each partition to two cluster nodes.

Flag for controlling the storage of an RDD.

Same as disk-only storage level but replicate each partition to two cluster nodes.
sourceraw docstring

distinctclj

(distinct rdd)
(distinct rdd num-partitions)

Params: ()

Result: JavaRDD[T]

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.829Z

Params: ()

Result: JavaRDD[T]



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.829Z
sourceraw docstring

empty-rddclj

(empty-rdd)
(empty-rdd spark)

Params:

Result: JavaRDD[T]

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.505Z

Params: 

Result: JavaRDD[T]



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.505Z
sourceraw docstring

empty?clj

(empty? rdd)

Params: ()

Result: Boolean

true if and only if the RDD contains no elements at all. Note that an RDD may be empty even when it has at least 1 partition.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.862Z

Params: ()

Result: Boolean

true if and only if the RDD contains no elements at all. Note that an RDD
        may be empty even when it has at least 1 partition.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.862Z
sourceraw docstring

filterclj

(filter rdd f)

Params: (f: Function[T, Boolean])

Result: JavaRDD[T]

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.832Z

Params: (f: Function[T, Boolean])

Result: JavaRDD[T]



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.832Z
sourceraw docstring

final-valueclj

(final-value result)

Params: ()

Result: R

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/partial/PartialResult.html

Timestamp: 2020-10-19T01:56:47.226Z

Params: ()

Result: R



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/partial/PartialResult.html

Timestamp: 2020-10-19T01:56:47.226Z
sourceraw docstring

final?clj

(final? result)

Params:

Result: Boolean

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/partial/PartialResult.html

Timestamp: 2020-10-19T01:56:47.229Z

Params: 

Result: Boolean



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/partial/PartialResult.html

Timestamp: 2020-10-19T01:56:47.229Z
sourceraw docstring

firstclj

(first rdd)

Params: ()

Result: T

Return the first element in this RDD.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.839Z

Params: ()

Result: T

Return the first element in this RDD.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.839Z
sourceraw docstring

flat-mapclj

(flat-map rdd f)

Params: (f: FlatMapFunction[T, U])

Result: JavaRDD[U]

Return a new RDD by first applying a function to all elements of this RDD, and then flattening the results.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.840Z

Params: (f: FlatMapFunction[T, U])

Result: JavaRDD[U]

 Return a new RDD by first applying a function to all elements of this
 RDD, and then flattening the results.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.840Z
sourceraw docstring

flat-map-to-pairclj

(flat-map-to-pair rdd f)

Params: (f: PairFlatMapFunction[T, K2, V2])

Result: JavaPairRDD[K2, V2]

Return a new RDD by first applying a function to all elements of this RDD, and then flattening the results.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.842Z

Params: (f: PairFlatMapFunction[T, K2, V2])

Result: JavaPairRDD[K2, V2]

 Return a new RDD by first applying a function to all elements of this
 RDD, and then flattening the results.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.842Z
sourceraw docstring

flat-map-valuesclj

(flat-map-values rdd f)

Params: (f: FlatMapFunction[V, U])

Result: JavaPairRDD[K, U]

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.082Z

Params: (f: FlatMapFunction[V, U])

Result: JavaPairRDD[K, U]



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.082Z
sourceraw docstring

foldclj

(fold rdd zero f)

Params: (zeroValue: T)

(f: Function2[T, T, T])

Result: T

Aggregate the elements of each partition, and then the results for all the partitions, using a given associative function and a neutral "zero value". The function op(t1, t2) is allowed to modify t1 and return it as its result value to avoid object allocation; however, it should not modify t2.

This behaves somewhat differently from fold operations implemented for non-distributed collections in functional languages like Scala. This fold operation may be applied to partitions individually, and then fold those results into the final result, rather than apply the fold to each element sequentially in some defined ordering. For functions that are not commutative, the result may differ from that of a fold applied to a non-distributed collection.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.844Z

Params: (zeroValue: T)

(f: Function2[T, T, T])

Result: T

Aggregate the elements of each partition, and then the results for all the partitions, using a
given associative function and a neutral "zero value". The function
op(t1, t2) is allowed to modify t1 and return it as its result value to avoid object
allocation; however, it should not modify t2.

This behaves somewhat differently from fold operations implemented for non-distributed
collections in functional languages like Scala. This fold operation may be applied to
partitions individually, and then fold those results into the final result, rather than
apply the fold to each element sequentially in some defined ordering. For functions
that are not commutative, the result may differ from that of a fold applied to a
non-distributed collection.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.844Z
sourceraw docstring

fold-by-keyclj

(fold-by-key rdd zero f)
(fold-by-key rdd zero partitions-or-partitioner f)

Params: (zeroValue: V, partitioner: Partitioner, func: Function2[V, V, V])

Result: JavaPairRDD[K, V]

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.088Z

Params: (zeroValue: V, partitioner: Partitioner, func: Function2[V, V, V])

Result: JavaPairRDD[K, V]



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.088Z
sourceraw docstring

foreachclj

(foreach rdd f)

Params: (f: VoidFunction[T])

Result: Unit

Applies a function f to all elements of this RDD.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.845Z

Params: (f: VoidFunction[T])

Result: Unit

Applies a function f to all elements of this RDD.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.845Z
sourceraw docstring

foreach-asyncclj

(foreach-async rdd f)

Params: (f: VoidFunction[T])

Result: JavaFutureAction[Void]

The asynchronous version of the foreach action, which applies a function f to all the elements of this RDD.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.846Z

Params: (f: VoidFunction[T])

Result: JavaFutureAction[Void]

The asynchronous version of the foreach action, which
applies a function f to all the elements of this RDD.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.846Z
sourceraw docstring

foreach-partitionclj

(foreach-partition rdd f)

Params: (f: VoidFunction[Iterator[T]])

Result: Unit

Applies a function f to each partition of this RDD.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.847Z

Params: (f: VoidFunction[Iterator[T]])

Result: Unit

Applies a function f to each partition of this RDD.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.847Z
sourceraw docstring

foreach-partition-asyncclj

(foreach-partition-async rdd f)

Params: (f: VoidFunction[Iterator[T]])

Result: JavaFutureAction[Void]

The asynchronous version of the foreachPartition action, which applies a function f to each partition of this RDD.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.849Z

Params: (f: VoidFunction[Iterator[T]])

Result: JavaFutureAction[Void]

The asynchronous version of the foreachPartition action, which
applies a function f to each partition of this RDD.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.849Z
sourceraw docstring

full-outer-joinclj

(full-outer-join left right)
(full-outer-join left right partitions-or-partitioner)

Params: (other: JavaPairRDD[K, W], partitioner: Partitioner)

Result: JavaPairRDD[K, (Optional[V], Optional[W])]

Perform a full outer join of this and other. For each element (k, v) in this, the resulting RDD will either contain all pairs (k, (Some(v), Some(w))) for w in other, or the pair (k, (Some(v), None)) if no elements in other have key k. Similarly, for each element (k, w) in other, the resulting RDD will either contain all pairs (k, (Some(v), Some(w))) for v in this, or the pair (k, (None, Some(w))) if no elements in this have key k. Uses the given Partitioner to partition the output RDD.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.102Z

Params: (other: JavaPairRDD[K, W], partitioner: Partitioner)

Result: JavaPairRDD[K, (Optional[V], Optional[W])]

Perform a full outer join of this and other. For each element (k, v) in this, the
resulting RDD will either contain all pairs (k, (Some(v), Some(w))) for w in other, or
the pair (k, (Some(v), None)) if no elements in other have key k. Similarly, for each
element (k, w) in other, the resulting RDD will either contain all pairs
(k, (Some(v), Some(w))) for v in this, or the pair (k, (None, Some(w))) if no elements
in this have key k. Uses the given Partitioner to partition the output RDD.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.102Z
sourceraw docstring

get-num-partitionsclj

(get-num-partitions rdd)

Params:

Result: Int

Return the number of partitions in this RDD.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.852Z

Params: 

Result: Int

Return the number of partitions in this RDD.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.852Z
sourceraw docstring

get-storage-levelclj

(get-storage-level rdd)

Params:

Result: StorageLevel

Get the RDD's current storage level, or StorageLevel.NONE if none is set.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.853Z

Params: 

Result: StorageLevel

Get the RDD's current storage level, or StorageLevel.NONE if none is set.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.853Z
sourceraw docstring

glomclj

(glom rdd)

Params: ()

Result: JavaRDD[List[T]]

Return an RDD created by coalescing all elements within each partition into an array.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.854Z

Params: ()

Result: JavaRDD[List[T]]

Return an RDD created by coalescing all elements within each partition into an array.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.854Z
sourceraw docstring

group-byclj

(group-by rdd f)
(group-by rdd f num-partitions)

Params: (f: Function[T, U])

Result: JavaPairRDD[U, Iterable[T]]

Return an RDD of grouped elements. Each group consists of a key and a sequence of elements mapping to that key.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.857Z

Params: (f: Function[T, U])

Result: JavaPairRDD[U, Iterable[T]]

Return an RDD of grouped elements. Each group consists of a key and a sequence of elements
mapping to that key.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.857Z
sourceraw docstring

group-by-keyclj

(group-by-key rdd)
(group-by-key rdd num-partitions)

Params: (partitioner: Partitioner)

Result: JavaPairRDD[K, Iterable[V]]

Group the values for each key in the RDD into a single sequence. Allows controlling the partitioning of the resulting key-value pair RDD by passing a Partitioner.

If you are grouping in order to perform an aggregation (such as a sum or average) over each key, using JavaPairRDD.reduceByKey or JavaPairRDD.combineByKey will provide much better performance.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.115Z

Params: (partitioner: Partitioner)

Result: JavaPairRDD[K, Iterable[V]]

Group the values for each key in the RDD into a single sequence. Allows controlling the
partitioning of the resulting key-value pair RDD by passing a Partitioner.


If you are grouping in order to perform an aggregation (such as a sum or average) over
each key, using JavaPairRDD.reduceByKey or JavaPairRDD.combineByKey
will provide much better performance.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.115Z
sourceraw docstring

idclj

(id rdd)

Params:

Result: Int

A unique ID for this RDD (within its SparkContext).

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.859Z

Params: 

Result: Int

A unique ID for this RDD (within its SparkContext).

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.859Z
sourceraw docstring

initial-valueclj

(initial-value result)

Params:

Result: R

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/partial/PartialResult.html

Timestamp: 2020-10-19T01:56:47.228Z

Params: 

Result: R



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/partial/PartialResult.html

Timestamp: 2020-10-19T01:56:47.228Z
sourceraw docstring

intersectionclj

(intersection)
(intersection rdd)
(intersection left right)
(intersection left right & rdds)

Params: (other: JavaRDD[T])

Result: JavaRDD[T]

Return the intersection of this RDD and another one. The output will not contain any duplicate elements, even if the input RDDs did.

This method performs a shuffle internally.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.860Z

Params: (other: JavaRDD[T])

Result: JavaRDD[T]

Return the intersection of this RDD and another one. The output will not contain any duplicate
elements, even if the input RDDs did.


This method performs a shuffle internally.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.860Z
sourceraw docstring

is-checkpointedclj

(is-checkpointed rdd)

Params:

Result: Boolean

Return whether this RDD has been checkpointed or not

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.861Z

Params: 

Result: Boolean

Return whether this RDD has been checkpointed or not


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.861Z
sourceraw docstring

is-emptyclj

(is-empty rdd)

Params: ()

Result: Boolean

true if and only if the RDD contains no elements at all. Note that an RDD may be empty even when it has at least 1 partition.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.862Z

Params: ()

Result: Boolean

true if and only if the RDD contains no elements at all. Note that an RDD
        may be empty even when it has at least 1 partition.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.862Z
sourceraw docstring

is-initial-value-finalclj

(is-initial-value-final result)

Params:

Result: Boolean

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/partial/PartialResult.html

Timestamp: 2020-10-19T01:56:47.229Z

Params: 

Result: Boolean



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/partial/PartialResult.html

Timestamp: 2020-10-19T01:56:47.229Z
sourceraw docstring

is-localclj

(is-local)
(is-local spark)

Params:

Result: Boolean

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.531Z

Params: 

Result: Boolean



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.531Z
sourceraw docstring

jarsclj

(jars)
(jars spark)

Params:

Result: List[String]

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.532Z

Params: 

Result: List[String]



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.532Z
sourceraw docstring

java-spark-contextclj

(java-spark-context spark)

Converts a SparkSession to a JavaSparkContext.

Converts a SparkSession to a JavaSparkContext.
sourceraw docstring

joinclj

(join left right)
(join left right partitions-or-partitioner)

Params: (other: JavaPairRDD[K, W], partitioner: Partitioner)

Result: JavaPairRDD[K, (V, W)]

Return an RDD containing all pairs of elements with matching keys in this and other. Each pair of elements will be returned as a (k, (v1, v2)) tuple, where (k, v1) is in this and (k, v2) is in other. Uses the given Partitioner to partition the output RDD.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.131Z

Params: (other: JavaPairRDD[K, W], partitioner: Partitioner)

Result: JavaPairRDD[K, (V, W)]

Return an RDD containing all pairs of elements with matching keys in this and other. Each
pair of elements will be returned as a (k, (v1, v2)) tuple, where (k, v1) is in this and
(k, v2) is in other. Uses the given Partitioner to partition the output RDD.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.131Z
sourceraw docstring

key-byclj

(key-by rdd f)

Params: (f: Function[T, U])

Result: JavaPairRDD[U, T]

Creates tuples of the elements in this RDD by applying f.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.865Z

Params: (f: Function[T, U])

Result: JavaPairRDD[U, T]

Creates tuples of the elements in this RDD by applying f.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.865Z
sourceraw docstring

keysclj

(keys rdd)

Params: ()

Result: JavaRDD[K]

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.139Z

Params: ()

Result: JavaRDD[K]



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.139Z
sourceraw docstring

left-outer-joinclj

(left-outer-join left right)
(left-outer-join left right partitions-or-partitioner)

Params: (other: JavaPairRDD[K, W], partitioner: Partitioner)

Result: JavaPairRDD[K, (V, Optional[W])]

Perform a left outer join of this and other. For each element (k, v) in this, the resulting RDD will either contain all pairs (k, (v, Some(w))) for w in other, or the pair (k, (v, None)) if no elements in other have key k. Uses the given Partitioner to partition the output RDD.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.144Z

Params: (other: JavaPairRDD[K, W], partitioner: Partitioner)

Result: JavaPairRDD[K, (V, Optional[W])]

Perform a left outer join of this and other. For each element (k, v) in this, the
resulting RDD will either contain all pairs (k, (v, Some(w))) for w in other, or the
pair (k, (v, None)) if no elements in other have key k. Uses the given Partitioner to
partition the output RDD.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.144Z
sourceraw docstring

local-propertyclj

(local-property k)
(local-property spark k)

Params: (key: String)

Result: String

Get a local property set in this thread, or null if it is missing. See org.apache.spark.api.java.JavaSparkContext.setLocalProperty.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.512Z

Params: (key: String)

Result: String

Get a local property set in this thread, or null if it is missing. See
org.apache.spark.api.java.JavaSparkContext.setLocalProperty.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.512Z
sourceraw docstring

local?clj

(local?)
(local? spark)

Params:

Result: Boolean

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.531Z

Params: 

Result: Boolean



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.531Z
sourceraw docstring

lookupclj

(lookup rdd k)

Params: (key: K)

Result: List[V]

Return the list of values in the RDD for key key. This operation is done efficiently if the RDD has a known partitioner by only searching the partition that the key maps to.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.145Z

Params: (key: K)

Result: List[V]

Return the list of values in the RDD for key key. This operation is done efficiently if the
RDD has a known partitioner by only searching the partition that the key maps to.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.145Z
sourceraw docstring

mapcljmultimethod

Params: (f: Function[T, R])

Result: JavaRDD[R]

Return a new RDD by applying a function to all elements of this RDD.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.867Z

Params: (f: Function[T, R])

Result: JavaRDD[R]

Return a new RDD by applying a function to all elements of this RDD.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.867Z
sourceraw docstring

map-partitionsclj

(map-partitions rdd f)
(map-partitions rdd f preserves-partitioning)

Params: (f: FlatMapFunction[Iterator[T], U])

Result: JavaRDD[U]

Return a new RDD by applying a function to each partition of this RDD.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.870Z

Params: (f: FlatMapFunction[Iterator[T], U])

Result: JavaRDD[U]

Return a new RDD by applying a function to each partition of this RDD.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.870Z
sourceraw docstring

map-partitions-to-pairclj

(map-partitions-to-pair rdd f)
(map-partitions-to-pair rdd f preserves-partitioning)

Params: (f: PairFlatMapFunction[Iterator[T], K2, V2])

Result: JavaPairRDD[K2, V2]

Return a new RDD by applying a function to each partition of this RDD.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.875Z

Params: (f: PairFlatMapFunction[Iterator[T], K2, V2])

Result: JavaPairRDD[K2, V2]

Return a new RDD by applying a function to each partition of this RDD.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.875Z
sourceraw docstring

map-partitions-with-indexclj

(map-partitions-with-index rdd f)
(map-partitions-with-index rdd f preserves-partitioning)

Params: (f: Function2[Integer, Iterator[T], Iterator[R]], preservesPartitioning: Boolean = false)

Result: JavaRDD[R]

Return a new RDD by applying a function to each partition of this RDD, while tracking the index of the original partition.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.877Z

Params: (f: Function2[Integer, Iterator[T], Iterator[R]], preservesPartitioning: Boolean = false)

Result: JavaRDD[R]

Return a new RDD by applying a function to each partition of this RDD, while tracking the index
of the original partition.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.877Z
sourceraw docstring

map-to-pairclj

(map-to-pair rdd f)

Params: (f: PairFunction[T, K2, V2])

Result: JavaPairRDD[K2, V2]

Return a new RDD by applying a function to all elements of this RDD.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.883Z

Params: (f: PairFunction[T, K2, V2])

Result: JavaPairRDD[K2, V2]

Return a new RDD by applying a function to all elements of this RDD.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.883Z
sourceraw docstring

map-valuesclj

(map-values rdd f)

Params: (f: Function[V, U])

Result: JavaPairRDD[K, U]

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.160Z

Params: (f: Function[V, U])

Result: JavaPairRDD[K, U]



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.160Z
sourceraw docstring

mapcatclj

(mapcat rdd f)

Params: (f: FlatMapFunction[T, U])

Result: JavaRDD[U]

Return a new RDD by first applying a function to all elements of this RDD, and then flattening the results.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.840Z

Params: (f: FlatMapFunction[T, U])

Result: JavaRDD[U]

 Return a new RDD by first applying a function to all elements of this
 RDD, and then flattening the results.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.840Z
sourceraw docstring

mapcat-to-pairclj

(mapcat-to-pair rdd f)

Params: (f: PairFlatMapFunction[T, K2, V2])

Result: JavaPairRDD[K2, V2]

Return a new RDD by first applying a function to all elements of this RDD, and then flattening the results.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.842Z

Params: (f: PairFlatMapFunction[T, K2, V2])

Result: JavaPairRDD[K2, V2]

 Return a new RDD by first applying a function to all elements of this
 RDD, and then flattening the results.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.842Z
sourceraw docstring

masterclj

(master)
(master spark)

Params:

Result: String

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.532Z

Params: 

Result: String



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.532Z
sourceraw docstring

maxclj

(max rdd cmp)

Params: (comp: Comparator[T])

Result: T

Returns the maximum element from this RDD as defined by the specified Comparator[T].

the comparator that defines ordering

the maximum of the RDD

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.884Z

Params: (comp: Comparator[T])

Result: T

Returns the maximum element from this RDD as defined by the specified
Comparator[T].


the comparator that defines ordering

the maximum of the RDD

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.884Z
sourceraw docstring

memory-and-diskclj

Flag for controlling the storage of an RDD.

The default behavior of the DataFrame or Dataset. In this Storage Level, The DataFrame will be stored in JVM memory as deserialized objects. When required storage is greater than available memory, it stores some of the excess partitions into a disk and reads the data from disk when it required. It is slower as there is I/O involved.

Flag for controlling the storage of an RDD.

The default behavior of the DataFrame or Dataset. In this Storage Level, The DataFrame will be stored in JVM memory as deserialized objects. When required storage is greater than available memory, it stores some of the excess partitions into a disk and reads the data from disk when it required. It is slower as there is I/O involved.
sourceraw docstring

memory-and-disk-2clj

Flag for controlling the storage of an RDD.

Same as memory-and-disk storage level but replicate each partition to two cluster nodes.

Flag for controlling the storage of an RDD.

Same as memory-and-disk storage level but replicate each partition to two cluster nodes.
sourceraw docstring

memory-and-disk-serclj

Flag for controlling the storage of an RDD.

Same as memory-and-disk storage level difference being it serializes the DataFrame objects in memory and on disk when space not available.

Flag for controlling the storage of an RDD.

Same as `memory-and-disk` storage level difference being it serializes the DataFrame objects in memory and on disk when space not available.
sourceraw docstring

memory-and-disk-ser-2clj

Flag for controlling the storage of an RDD.

Same as memory-and-disk-ser storage level but replicate each partition to two cluster nodes.

Flag for controlling the storage of an RDD.

Same as memory-and-disk-ser storage level but replicate each partition to two cluster nodes.
sourceraw docstring

memory-onlyclj

Flag for controlling the storage of an RDD.

Flag for controlling the storage of an RDD.
sourceraw docstring

memory-only-2clj

Flag for controlling the storage of an RDD.

Same as memory-only storage level but replicate each partition to two cluster nodes.

Flag for controlling the storage of an RDD.

Same as `memory-only` storage level but replicate each partition to two cluster nodes.
sourceraw docstring

memory-only-serclj

Flag for controlling the storage of an RDD.

Same as memory-only but the difference being it stores RDD as serialized objects to JVM memory. It takes lesser memory (space-efficient) then memory-only as it saves objects as serialized and takes an additional few more CPU cycles in order to deserialize.

Flag for controlling the storage of an RDD.

Same as `memory-only` but the difference being it stores RDD as serialized objects to JVM memory. It takes lesser memory (space-efficient) then `memory-only` as it saves objects as serialized and takes an additional few more CPU cycles in order to deserialize.
sourceraw docstring

memory-only-ser-2clj

Flag for controlling the storage of an RDD.

Same as memory-only-ser storage level but replicate each partition to two cluster nodes.

Flag for controlling the storage of an RDD.

Same as `memory-only-ser` storage level but replicate each partition to two cluster nodes.
sourceraw docstring

minclj

(min rdd cmp)

Params: (comp: Comparator[T])

Result: T

Returns the minimum element from this RDD as defined by the specified Comparator[T].

the comparator that defines ordering

the minimum of the RDD

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.885Z

Params: (comp: Comparator[T])

Result: T

Returns the minimum element from this RDD as defined by the specified
Comparator[T].


the comparator that defines ordering

the minimum of the RDD

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.885Z
sourceraw docstring

nameclj

(name rdd)

Params: ()

Result: String

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.886Z

Params: ()

Result: String



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.886Z
sourceraw docstring

noneclj

Flag for controlling the storage of an RDD.

No caching.

Flag for controlling the storage of an RDD.

No caching.
sourceraw docstring

num-partitionsclj

(num-partitions rdd)

Params:

Result: Int

Return the number of partitions in this RDD.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.852Z

Params: 

Result: Int

Return the number of partitions in this RDD.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.852Z
sourceraw docstring

off-heapclj

Flag for controlling the storage of an RDD.

Off-heap refers to objects (serialised to byte array) that are managed by the operating system but stored outside the process heap in native memory (therefore, they are not processed by the garbage collector). Accessing this data is slightly slower than accessing the on-heap storage but still faster than reading/writing from a disk. The downside is that the user has to manually deal with managing the allocated memory.

Flag for controlling the storage of an RDD.

Off-heap refers to objects (serialised to byte array) that are managed by the operating system but stored outside the process heap in native memory (therefore, they are not processed by the garbage collector). Accessing this data is slightly slower than accessing the on-heap storage but still faster than reading/writing from a disk. The downside is that the user has to manually deal with managing the allocated memory.
sourceraw docstring

paralleliseclj

(parallelise data)
(parallelise spark data)

Params: (list: List[T], numSlices: Int)

Result: JavaRDD[T]

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.544Z

Params: (list: List[T], numSlices: Int)

Result: JavaRDD[T]



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.544Z
sourceraw docstring

parallelise-doublesclj

(parallelise-doubles data)
(parallelise-doubles spark data)

Params: (list: List[Double], numSlices: Int)

Result: JavaDoubleRDD

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.546Z

Params: (list: List[Double], numSlices: Int)

Result: JavaDoubleRDD



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.546Z
sourceraw docstring

parallelise-pairsclj

(parallelise-pairs data)
(parallelise-pairs spark data)

Params: (list: List[(K, V)], numSlices: Int)

Result: JavaPairRDD[K, V]

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.549Z

Params: (list: List[(K, V)], numSlices: Int)

Result: JavaPairRDD[K, V]



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.549Z
sourceraw docstring

parallelizeclj

(parallelize data)
(parallelize spark data)

Params: (list: List[T], numSlices: Int)

Result: JavaRDD[T]

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.544Z

Params: (list: List[T], numSlices: Int)

Result: JavaRDD[T]



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.544Z
sourceraw docstring

parallelize-doublesclj

(parallelize-doubles data)
(parallelize-doubles spark data)

Params: (list: List[Double], numSlices: Int)

Result: JavaDoubleRDD

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.546Z

Params: (list: List[Double], numSlices: Int)

Result: JavaDoubleRDD



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.546Z
sourceraw docstring

parallelize-pairsclj

(parallelize-pairs data)
(parallelize-pairs spark data)

Params: (list: List[(K, V)], numSlices: Int)

Result: JavaPairRDD[K, V]

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.549Z

Params: (list: List[(K, V)], numSlices: Int)

Result: JavaPairRDD[K, V]



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.549Z
sourceraw docstring

partition-byclj

(partition-by rdd partitioner)

Params: (partitioner: Partitioner)

Result: JavaPairRDD[K, V]

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.168Z

Params: (partitioner: Partitioner)

Result: JavaPairRDD[K, V]



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.168Z
sourceraw docstring

partitionerclj

(partitioner rdd)

Params:

Result: Optional[Partitioner]

The partitioner of this RDD.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.890Z

Params: 

Result: Optional[Partitioner]

The partitioner of this RDD.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.890Z
sourceraw docstring

partitionsclj

(partitions rdd)

Params:

Result: List[Partition]

Set of partitions in this RDD.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.891Z

Params: 

Result: List[Partition]

Set of partitions in this RDD.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.891Z
sourceraw docstring

persistclj

(persist rdd storage)

Params: (newLevel: StorageLevel)

Result: JavaRDD[T]

Set this RDD's storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet..

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.892Z

Params: (newLevel: StorageLevel)

Result: JavaRDD[T]

Set this RDD's storage level to persist its values across operations after the first time
it is computed. This can only be used to assign a new storage level if the RDD does not
have a storage level set yet..


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.892Z
sourceraw docstring

persistent-rddsclj

(persistent-rdds)
(persistent-rdds spark)

Params:

Result: Map[Integer, JavaRDD[_]]

Returns a Java map of JavaRDDs that have marked themselves as persistent via cache() call.

This does not necessarily mean the caching or computation was successful.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.513Z

Params: 

Result: Map[Integer, JavaRDD[_]]

Returns a Java map of JavaRDDs that have marked themselves as persistent via cache() call.


This does not necessarily mean the caching or computation was successful.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.513Z
sourceraw docstring

random-splitclj

(random-split rdd weights)
(random-split rdd weights seed)

Params: (weights: Array[Double])

Result: Array[JavaRDD[T]]

Randomly splits this RDD with the provided weights.

weights for splits, will be normalized if they don't sum to 1

split RDDs in an array

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.902Z

Params: (weights: Array[Double])

Result: Array[JavaRDD[T]]

Randomly splits this RDD with the provided weights.


weights for splits, will be normalized if they don't sum to 1

split RDDs in an array

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.902Z
sourceraw docstring

rdd?clj

(rdd? value)

Tests if value is an instance of JavaRDD.

Tests if `value` is an instance of `JavaRDD`.
sourceraw docstring

reduceclj

(reduce rdd f)

Params: (f: Function2[T, T, T])

Result: T

Reduces the elements of this RDD using the specified commutative and associative binary operator.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.904Z

Params: (f: Function2[T, T, T])

Result: T

Reduces the elements of this RDD using the specified commutative and associative binary
operator.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.904Z
sourceraw docstring

reduce-by-keyclj

(reduce-by-key rdd f)
(reduce-by-key rdd f partitions-or-partitioner)

Params: (partitioner: Partitioner, func: Function2[V, V, V])

Result: JavaPairRDD[K, V]

Merge the values for each key using an associative and commutative reduce function. This will also perform the merging locally on each mapper before sending results to a reducer, similarly to a "combiner" in MapReduce.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.188Z

Params: (partitioner: Partitioner, func: Function2[V, V, V])

Result: JavaPairRDD[K, V]

Merge the values for each key using an associative and commutative reduce function. This will
also perform the merging locally on each mapper before sending results to a reducer, similarly
to a "combiner" in MapReduce.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.188Z
sourceraw docstring

reduce-by-key-locallyclj

(reduce-by-key-locally rdd f)

Params: (func: Function2[V, V, V])

Result: Map[K, V]

Merge the values for each key using an associative and commutative reduce function, but return the result immediately to the master as a Map. This will also perform the merging locally on each mapper before sending results to a reducer, similarly to a "combiner" in MapReduce.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.190Z

Params: (func: Function2[V, V, V])

Result: Map[K, V]

Merge the values for each key using an associative and commutative reduce function, but return
the result immediately to the master as a Map. This will also perform the merging locally on
each mapper before sending results to a reducer, similarly to a "combiner" in MapReduce.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.190Z
sourceraw docstring

repartitionclj

(repartition rdd num-partitions)

Params: (numPartitions: Int)

Result: JavaRDD[T]

Return a new RDD that has exactly numPartitions partitions.

Can increase or decrease the level of parallelism in this RDD. Internally, this uses a shuffle to redistribute data.

If you are decreasing the number of partitions in this RDD, consider using coalesce, which can avoid performing a shuffle.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.905Z

Params: (numPartitions: Int)

Result: JavaRDD[T]

Return a new RDD that has exactly numPartitions partitions.

Can increase or decrease the level of parallelism in this RDD. Internally, this uses
a shuffle to redistribute data.

If you are decreasing the number of partitions in this RDD, consider using coalesce,
which can avoid performing a shuffle.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.905Z
sourceraw docstring

repartition-and-sort-within-partitionsclj

(repartition-and-sort-within-partitions rdd partitioner)
(repartition-and-sort-within-partitions rdd partitioner cmp)

Params: (partitioner: Partitioner)

Result: JavaPairRDD[K, V]

Repartition the RDD according to the given partitioner and, within each resulting partition, sort records by their keys.

This is more efficient than calling repartition and then sorting within each partition because it can push the sorting down into the shuffle machinery.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.193Z

Params: (partitioner: Partitioner)

Result: JavaPairRDD[K, V]

Repartition the RDD according to the given partitioner and, within each resulting partition,
sort records by their keys.

This is more efficient than calling repartition and then sorting within each partition
because it can push the sorting down into the shuffle machinery.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.193Z
sourceraw docstring

resourcesclj

(resources)
(resources spark)

Params:

Result: Map[String, ResourceInformation]

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.550Z

Params: 

Result: Map[String, ResourceInformation]



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.550Z
sourceraw docstring

right-outer-joinclj

(right-outer-join left right)
(right-outer-join left right partitions-or-partitioner)

Params: (other: JavaPairRDD[K, W], partitioner: Partitioner)

Result: JavaPairRDD[K, (Optional[V], W)]

Perform a right outer join of this and other. For each element (k, w) in other, the resulting RDD will either contain all pairs (k, (Some(v), w)) for v in this, or the pair (k, (None, w)) if no elements in this have key k. Uses the given Partitioner to partition the output RDD.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.198Z

Params: (other: JavaPairRDD[K, W], partitioner: Partitioner)

Result: JavaPairRDD[K, (Optional[V], W)]

Perform a right outer join of this and other. For each element (k, w) in other, the
resulting RDD will either contain all pairs (k, (Some(v), w)) for v in this, or the
pair (k, (None, w)) if no elements in this have key k. Uses the given Partitioner to
partition the output RDD.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.198Z
sourceraw docstring

sampleclj

(sample rdd with-replacement fraction)
(sample rdd with-replacement fraction seed)

Params: (withReplacement: Boolean, fraction: Double)

Result: JavaRDD[T]

Return a sampled subset of this RDD with a random seed.

can elements be sampled multiple times (replaced when sampled out)

expected size of the sample as a fraction of this RDD's size without replacement: probability that each element is chosen; fraction must be [0, 1] with replacement: expected number of times each element is chosen; fraction must be greater than or equal to 0

This is NOT guaranteed to provide exactly the fraction of the count of the given RDD.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.908Z

Params: (withReplacement: Boolean, fraction: Double)

Result: JavaRDD[T]

Return a sampled subset of this RDD with a random seed.


can elements be sampled multiple times (replaced when sampled out)

expected size of the sample as a fraction of this RDD's size
 without replacement: probability that each element is chosen; fraction must be [0, 1]
 with replacement: expected number of times each element is chosen; fraction must be greater
 than or equal to 0

This is NOT guaranteed to provide exactly the fraction of the count
of the given RDD.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.908Z
sourceraw docstring

sample-by-keyclj

(sample-by-key rdd with-replacement fractions)
(sample-by-key rdd with-replacement fractions seed)

Params: (withReplacement: Boolean, fractions: Map[K, Double], seed: Long)

Result: JavaPairRDD[K, V]

Return a subset of this RDD sampled by key (via stratified sampling).

Create a sample of this RDD using variable sampling rates for different keys as specified by fractions, a key to sampling rate map, via simple random sampling with one pass over the RDD, to produce a sample of size that's approximately equal to the sum of math.ceil(numItems * samplingRate) over all key values.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.203Z

Params: (withReplacement: Boolean, fractions: Map[K, Double], seed: Long)

Result: JavaPairRDD[K, V]

Return a subset of this RDD sampled by key (via stratified sampling).

Create a sample of this RDD using variable sampling rates for different keys as specified by
fractions, a key to sampling rate map, via simple random sampling with one pass over the
RDD, to produce a sample of size that's approximately equal to the sum of
math.ceil(numItems * samplingRate) over all key values.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.203Z
sourceraw docstring

sample-by-key-exactclj

(sample-by-key-exact rdd with-replacement fractions)
(sample-by-key-exact rdd with-replacement fractions seed)

Params: (withReplacement: Boolean, fractions: Map[K, Double], seed: Long)

Result: JavaPairRDD[K, V]

Return a subset of this RDD sampled by key (via stratified sampling) containing exactly math.ceil(numItems * samplingRate) for each stratum (group of pairs with the same key).

This method differs from sampleByKey in that we make additional passes over the RDD to create a sample size that's exactly equal to the sum of math.ceil(numItems * samplingRate) over all key values with a 99.99% confidence. When sampling without replacement, we need one additional pass over the RDD to guarantee sample size; when sampling with replacement, we need two additional passes.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.206Z

Params: (withReplacement: Boolean, fractions: Map[K, Double], seed: Long)

Result: JavaPairRDD[K, V]

Return a subset of this RDD sampled by key (via stratified sampling) containing exactly
math.ceil(numItems * samplingRate) for each stratum (group of pairs with the same key).

This method differs from sampleByKey in that we make additional passes over the RDD to
create a sample size that's exactly equal to the sum of math.ceil(numItems * samplingRate)
over all key values with a 99.99% confidence. When sampling without replacement, we need one
additional pass over the RDD to guarantee sample size; when sampling with replacement, we need
two additional passes.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.206Z
sourceraw docstring

save-as-text-fileclj

(save-as-text-file rdd path)

Params: (path: String)

Result: Unit

Save this RDD as a text file, using string representations of elements.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.911Z

Params: (path: String)

Result: Unit

Save this RDD as a text file, using string representations of elements.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.911Z
sourceraw docstring

scclj

(sc)
(sc spark)

Params:

Result: SparkContext

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.550Z

Params: 

Result: SparkContext



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.550Z
sourceraw docstring

sort-by-keyclj

(sort-by-key rdd)
(sort-by-key rdd asc)

Params: ()

Result: JavaPairRDD[K, V]

Sort the RDD by key, so that each partition contains a sorted range of the elements in ascending order. Calling collect or save on the resulting RDD will return or output an ordered list of records (in the save case, they will be written to multiple part-X files in the filesystem, in order of the keys).

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.231Z

Params: ()

Result: JavaPairRDD[K, V]

Sort the RDD by key, so that each partition contains a sorted range of the elements in
ascending order. Calling collect or save on the resulting RDD will return or output an
ordered list of records (in the save case, they will be written to multiple part-X files
in the filesystem, in order of the keys).


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.231Z
sourceraw docstring

spark-contextclj

(spark-context)
(spark-context spark)

Params:

Result: SparkContext

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.550Z

Params: 

Result: SparkContext



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.550Z
sourceraw docstring

spark-homeclj

(spark-home)
(spark-home spark)

Params: ()

Result: Optional[String]

Get Spark's home location from either a value set through the constructor, or the spark.home Java property, or the SPARK_HOME environment variable (in that order of preference). If neither of these is set, return None.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.518Z

Params: ()

Result: Optional[String]

Get Spark's home location from either a value set through the constructor,
or the spark.home Java property, or the SPARK_HOME environment variable
(in that order of preference). If neither of these is set, return None.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.518Z
sourceraw docstring

storage-levelclj

(storage-level rdd)

Params:

Result: StorageLevel

Get the RDD's current storage level, or StorageLevel.NONE if none is set.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.853Z

Params: 

Result: StorageLevel

Get the RDD's current storage level, or StorageLevel.NONE if none is set.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.853Z
sourceraw docstring

subtractclj

(subtract)
(subtract rdd)
(subtract left right)
(subtract left right arg)
(subtract left right arg & rdds)

Params: (other: JavaRDD[T])

Result: JavaRDD[T]

Return an RDD with the elements from this that are not in other.

Uses this partitioner/partition size, because even if other is huge, the resulting RDD will be less than or equal to us.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.917Z

Params: (other: JavaRDD[T])

Result: JavaRDD[T]

Return an RDD with the elements from this that are not in other.

Uses this partitioner/partition size, because even if other is huge, the resulting
RDD will be less than or equal to us.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.917Z
sourceraw docstring

subtract-by-keyclj

(subtract-by-key left right)
(subtract-by-key left right partitions-or-partitioner)

Params: (other: JavaPairRDD[K, W])

Result: JavaPairRDD[K, V]

Return an RDD with the pairs from this whose keys are not in other.

Uses this partitioner/partition size, because even if other is huge, the resulting RDD will be <= us.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.240Z

Params: (other: JavaPairRDD[K, W])

Result: JavaPairRDD[K, V]

Return an RDD with the pairs from this whose keys are not in other.

Uses this partitioner/partition size, because even if other is huge, the resulting
RDD will be <= us.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.240Z
sourceraw docstring

takeclj

(take rdd n)

Params: (num: Int)

Result: List[T]

Take the first num elements of the RDD. This currently scans the partitions one by one, so it will be slow if a lot of partitions are required. In that case, use collect() to get the whole RDD instead.

this method should only be used if the resulting array is expected to be small, as all the data is loaded into the driver's memory.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.923Z

Params: (num: Int)

Result: List[T]

Take the first num elements of the RDD. This currently scans the partitions *one by one*, so
it will be slow if a lot of partitions are required. In that case, use collect() to get the
whole RDD instead.


this method should only be used if the resulting array is expected to be small, as
all the data is loaded into the driver's memory.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.923Z
sourceraw docstring

take-asyncclj

(take-async rdd n)

Params: (num: Int)

Result: JavaFutureAction[List[T]]

The asynchronous version of the take action, which returns a future for retrieving the first num elements of this RDD.

this method should only be used if the resulting array is expected to be small, as all the data is loaded into the driver's memory.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.924Z

Params: (num: Int)

Result: JavaFutureAction[List[T]]

The asynchronous version of the take action, which returns a
future for retrieving the first num elements of this RDD.


this method should only be used if the resulting array is expected to be small, as
all the data is loaded into the driver's memory.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.924Z
sourceraw docstring

take-orderedclj

(take-ordered rdd n)
(take-ordered rdd n cmp)

Params: (num: Int, comp: Comparator[T])

Result: List[T]

Returns the first k (smallest) elements from this RDD as defined by the specified Comparator[T] and maintains the order.

k, the number of elements to return

the comparator that defines the order

an array of top elements

this method should only be used if the resulting array is expected to be small, as all the data is loaded into the driver's memory.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.927Z

Params: (num: Int, comp: Comparator[T])

Result: List[T]

Returns the first k (smallest) elements from this RDD as defined by
the specified Comparator[T] and maintains the order.


k, the number of elements to return

the comparator that defines the order

an array of top elements

this method should only be used if the resulting array is expected to be small, as
all the data is loaded into the driver's memory.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.927Z
sourceraw docstring

take-sampleclj

(take-sample rdd with-replacement n)
(take-sample rdd with-replacement n seed)

Params: (withReplacement: Boolean, num: Int)

Result: List[T]

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.929Z

Params: (withReplacement: Boolean, num: Int)

Result: List[T]



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.929Z
sourceraw docstring

text-filecljmultimethod

Params: (path: String)

Result: JavaRDD[String]

Read a text file from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI, and return it as an RDD of Strings. The text files must be encoded as UTF-8.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.570Z

Params: (path: String)

Result: JavaRDD[String]

Read a text file from HDFS, a local file system (available on all nodes), or any
Hadoop-supported file system URI, and return it as an RDD of Strings.
The text files must be encoded as UTF-8.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.570Z
sourceraw docstring

topclj

(top rdd n)
(top rdd n cmp)

Params: (num: Int, comp: Comparator[T])

Result: List[T]

Returns the top k (largest) elements from this RDD as defined by the specified Comparator[T] and maintains the order.

k, the number of top elements to return

the comparator that defines the order

an array of top elements

this method should only be used if the resulting array is expected to be small, as all the data is loaded into the driver's memory.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.935Z

Params: (num: Int, comp: Comparator[T])

Result: List[T]

Returns the top k (largest) elements from this RDD as defined by
the specified Comparator[T] and maintains the order.


k, the number of top elements to return

the comparator that defines the order

an array of top elements

this method should only be used if the resulting array is expected to be small, as
all the data is loaded into the driver's memory.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.935Z
sourceraw docstring

unionclj

(union)
(union rdd)
(union left right)
(union left right & rdds)

Params: (other: JavaRDD[T])

Result: JavaRDD[T]

Return the union of this RDD and another one. Any identical elements will appear multiple times (use .distinct() to eliminate them).

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.942Z

Params: (other: JavaRDD[T])

Result: JavaRDD[T]

Return the union of this RDD and another one. Any identical elements will appear multiple
times (use .distinct() to eliminate them).


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.942Z
sourceraw docstring

unpersistclj

(unpersist rdd)
(unpersist rdd blocking)

Params: ()

Result: JavaRDD[T]

Mark the RDD as non-persistent, and remove all blocks for it from memory and disk. This method blocks until all blocks are deleted.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.944Z

Params: ()

Result: JavaRDD[T]

Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
This method blocks until all blocks are deleted.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.944Z
sourceraw docstring

valsclj

(vals rdd)

Params: ()

Result: JavaRDD[V]

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.266Z

Params: ()

Result: JavaRDD[V]



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.266Z
sourceraw docstring

valueclj

memfn of value

memfn of value
sourceraw docstring

valuesclj

(values rdd)

Params: ()

Result: JavaRDD[V]

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.266Z

Params: ()

Result: JavaRDD[V]



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaPairRDD.html

Timestamp: 2020-10-19T01:56:48.266Z
sourceraw docstring

versionclj

(version)
(version spark)

Params:

Result: String

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.576Z

Params: 

Result: String



Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.576Z
sourceraw docstring

whole-text-filescljmultimethod

Params: (path: String, minPartitions: Int)

Result: JavaPairRDD[String, String]

Read a directory of text files from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI. Each file is read as a single record and returned in a key-value pair, where the key is the path of each file, the value is the content of each file. The text files must be encoded as UTF-8.

For example, if you have the following files:

Do

then rdd contains

A suggestion value of the minimal splitting number for input data.

Small files are preferred, large file is also allowable, but may cause bad performance.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.582Z

Params: (path: String, minPartitions: Int)

Result: JavaPairRDD[String, String]

Read a directory of text files from HDFS, a local file system (available on all nodes), or any
Hadoop-supported file system URI. Each file is read as a single record and returned in a
key-value pair, where the key is the path of each file, the value is the content of each file.
The text files must be encoded as UTF-8.

 For example, if you have the following files:

Do

 then rdd contains

A suggestion value of the minimal splitting number for input data.

Small files are preferred, large file is also allowable, but may cause bad performance.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaSparkContext.html

Timestamp: 2020-10-19T01:56:49.582Z
sourceraw docstring

zipclj

(zip left right)

Params: (other: JavaRDDLike[U, _])

Result: JavaPairRDD[T, U]

Zips this RDD with another one, returning key-value pairs with the first element in each RDD, second element in each RDD, etc. Assumes that the two RDDs have the same number of partitions and the same number of elements in each partition (e.g. one was made through a map on the other).

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.950Z

Params: (other: JavaRDDLike[U, _])

Result: JavaPairRDD[T, U]

Zips this RDD with another one, returning key-value pairs with the first element in each RDD,
second element in each RDD, etc. Assumes that the two RDDs have the *same number of
partitions* and the *same number of elements in each partition* (e.g. one was made through
a map on the other).


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.950Z
sourceraw docstring

zip-partitionsclj

(zip-partitions left right f)

Params: (other: JavaRDDLike[U, _], f: FlatMapFunction2[Iterator[T], Iterator[U], V])

Result: JavaRDD[V]

Zip this RDD's partitions with one (or more) RDD(s) and return a new RDD by applying a function to the zipped partitions. Assumes that all the RDDs have the same number of partitions, but does not require them to have the same number of elements in each partition.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.952Z

Params: (other: JavaRDDLike[U, _], f: FlatMapFunction2[Iterator[T], Iterator[U], V])

Result: JavaRDD[V]

Zip this RDD's partitions with one (or more) RDD(s) and return a new RDD by
applying a function to the zipped partitions. Assumes that all the RDDs have the
*same number of partitions*, but does *not* require them to have the same number
of elements in each partition.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.952Z
sourceraw docstring

zip-with-indexclj

(zip-with-index rdd)

Params: ()

Result: JavaPairRDD[T, Long]

Zips this RDD with its element indices. The ordering is first based on the partition index and then the ordering of items within each partition. So the first item in the first partition gets index 0, and the last item in the last partition receives the largest index. This is similar to Scala's zipWithIndex but it uses Long instead of Int as the index type. This method needs to trigger a spark job when this RDD contains more than one partitions.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.953Z

Params: ()

Result: JavaPairRDD[T, Long]

Zips this RDD with its element indices. The ordering is first based on the partition index
and then the ordering of items within each partition. So the first item in the first
partition gets index 0, and the last item in the last partition receives the largest index.
This is similar to Scala's zipWithIndex but it uses Long instead of Int as the index type.
This method needs to trigger a spark job when this RDD contains more than one partitions.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.953Z
sourceraw docstring

zip-with-unique-idclj

(zip-with-unique-id rdd)

Params: ()

Result: JavaPairRDD[T, Long]

Zips this RDD with generated unique Long ids. Items in the kth partition will get ids k, n+k, 2*n+k, ..., where n is the number of partitions. So there may exist gaps, but this method won't trigger a spark job, which is different from org.apache.spark.rdd.RDD#zipWithIndex.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.954Z

Params: ()

Result: JavaPairRDD[T, Long]

Zips this RDD with generated unique Long ids. Items in the kth partition will get ids k, n+k,
2*n+k, ..., where n is the number of partitions. So there may exist gaps, but this method
won't trigger a spark job, which is different from org.apache.spark.rdd.RDD#zipWithIndex.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/api/java/JavaRDD.html

Timestamp: 2020-10-19T01:56:48.954Z
sourceraw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close