A set of tools for using reducers over potentially very large text files.
A set of tools for using reducers over potentially very large text files.
(chunk-seq filename)
(chunk-seq filename buffer-size)
(chunk-seq filename buffer-size separator)
Returns the sequence of arrays from underlying file seq. Useful for iterable folds which don't support CollFold protocol.
Returns the sequence of arrays from underlying file seq. Useful for iterable folds which don't support CollFold protocol.
(numbered-vec filename)
(numbered-vec filename chunk-size)
(numbered-vec filename chunk-size delim)
Return a NumberedFileVector, which has the line number appended to the beginning of each line with the provided delimiter (default ab)
Return a NumberedFileVector, which has the line number appended to the beginning of each line with the provided delimiter (default ab)
(rec-seq filename)
(rec-seq filename buffer-size)
(rec-seq filename buffer-size separator)
Almost same as FileSeq but record separator can be multibyte array and it will not strip newlines or separators from output strings.
Almost same as FileSeq but record separator can be multibyte array and it will *not* strip newlines or separators from output strings.
(seq filename)
(seq filename buffer-size)
(seq filename buffer-size byte-separator)
Return a seq like structure over an mmap'd file on disk. Poor performance for typical ISeq access (first, next, etc), but fast when reduced over.
You can provide a buffer size in bytes, which indicates the buffer size to read from disk from, as well as the smallest set of data to fork. A byte can be provided to indicate separation between records.
Default values are a 256KB buffer, and separation on 10 (Newline in ASCII).
Return a seq like structure over an mmap'd file on disk. Poor performance for typical ISeq access (first, next, etc), but fast when reduced over. You can provide a buffer size in *bytes*, which indicates the buffer size to read from disk from, as well as the smallest set of data to fork. A byte can be provided to indicate separation between records. Default values are a 256KB buffer, and separation on 10 (Newline in ASCII).
(subvec v start)
(subvec v start end)
Return a subset of the provided flatfileclj vector. If end not provided, defaults to (count v).
Return a subset of the provided flatfileclj vector. If end not provided, defaults to (count v).
(vec filename)
(vec filename chunk-size)
(vec filename chunk-size byte-separator)
Return a vector like structure mmap'd over a file on disk.
On creation, an index of the file will be constructed so random access will be O(1), similar to a normal Clojure vector. This is significantly more memory effecient than a vector of Strings.
You can provide the chunk size and a single char field delimiter as well.
Return a vector like structure mmap'd over a file on disk. On creation, an index of the file will be constructed so random access will be O(1), similar to a normal Clojure vector. This is significantly more memory effecient than a vector of Strings. You can provide the chunk size and a single char field delimiter as well.
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close