(collect-to-arrow rdd chunk-size out-dir)
Collects the dataframe on driver and exports it as arrow files.
The data gets transfered by partition, and so each partions should be small
enough to fit in heap space of the driver. Then the data is saved in chunks
of chunk-size
rows to disk as arrow files.
rdd
Spark dataset
chunk-size
Number of rows each arrow file will have. Should be small
enoungh to make data fit in heap space of driver.
out-dir
Output dir of arrow files
Collects the dataframe on driver and exports it as arrow files. The data gets transfered by partition, and so each partions should be small enough to fit in heap space of the driver. Then the data is saved in chunks of `chunk-size` rows to disk as arrow files. `rdd` Spark dataset `chunk-size` Number of rows each arrow file will have. Should be small enoungh to make data fit in heap space of driver. `out-dir` Output dir of arrow files
(typed-action action col-type value-info row-info col-name allocator)
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close