Liking cljdoc? Tell your friends :D

chunk.core

Recursive text splitting (chunking) for RAG / LLM pipelines.

split breaks text into overlapping chunks no larger than a target size, trying an ordered list of separators from coarsest (paragraph) to finest (character) so chunks land on natural boundaries. Size is measured by :length-fn (default count = characters); pass a token counter (e.g. tokenizers-clj's count-tokens) to chunk by tokens instead - the correct unit when feeding a model with a token limit.

Recursive text splitting (chunking) for RAG / LLM pipelines.

`split` breaks text into overlapping chunks no larger than a target size, trying an
ordered list of separators from coarsest (paragraph) to finest (character) so chunks
land on natural boundaries. Size is measured by `:length-fn` (default `count` =
characters); pass a token counter (e.g. tokenizers-clj's `count-tokens`) to chunk by
tokens instead - the correct unit when feeding a model with a token limit.
raw docstring

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts
Ctrl+kJump to recent docs
Move to previous article
Move to next article
Ctrl+/Jump to the search field
× close