Recursive text splitting (chunking) for RAG / LLM pipelines.
split breaks text into overlapping chunks no larger than a target size, trying an
ordered list of separators from coarsest (paragraph) to finest (character) so chunks
land on natural boundaries. Size is measured by :length-fn (default count =
characters); pass a token counter (e.g. tokenizers-clj's count-tokens) to chunk by
tokens instead - the correct unit when feeding a model with a token limit.
Recursive text splitting (chunking) for RAG / LLM pipelines. `split` breaks text into overlapping chunks no larger than a target size, trying an ordered list of separators from coarsest (paragraph) to finest (character) so chunks land on natural boundaries. Size is measured by `:length-fn` (default `count` = characters); pass a token counter (e.g. tokenizers-clj's `count-tokens`) to chunk by tokens instead - the correct unit when feeding a model with a token limit.
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |