Liking cljdoc? Tell your friends :D

zero-one.geni.arrow


collect-to-arrowclj

(collect-to-arrow rdd chunk-size out-dir)

Collects the dataframe on driver and exports it as arrow files. The data gets transfered by partition, and so each partions should be small enough to fit in heap space of the driver. Then the data is saved in chunks of chunk-size rows to disk as arrow files.

rdd Spark dataset chunk-size Number of rows each arrow file will have. Should be small enoungh to make data fit in heap space of driver. out-dir Output dir of arrow files

Collects the dataframe on driver and exports it as arrow files.
The data gets transfered by partition, and so each partions should be small
 enough to fit in heap space of the driver. Then the data is saved in chunks
 of `chunk-size` rows to disk as arrow files.

 `rdd` Spark dataset
 `chunk-size` Number of rows each arrow file will have. Should be small
  enoungh to make data fit in heap space of driver.
 `out-dir` Output dir of arrow files
sourceraw docstring

typed-actionclj

(typed-action action col-type value-info row-info col-name allocator)
source

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close