(collect-to-arrow rdd chunk-size out-dir)Collects the dataframe on driver and exports it as arrow files.
The data gets transfered by partition, and so each partions should be small
enough to fit in heap space of the driver. Then the data is saved in chunks
of chunk-size rows to disk as arrow files.
rdd Spark dataset
chunk-size Number of rows each arrow file will have. Should be small
enoungh to make data fit in heap space of driver.
out-dir Output dir of arrow files
Collects the dataframe on driver and exports it as arrow files. The data gets transfered by partition, and so each partions should be small enough to fit in heap space of the driver. Then the data is saved in chunks of `chunk-size` rows to disk as arrow files. `rdd` Spark dataset `chunk-size` Number of rows each arrow file will have. Should be small enoungh to make data fit in heap space of driver. `out-dir` Output dir of arrow files
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |