Text splitter by individual characters.
Text splitter by individual characters.
Since of the chunks in which the text gets split.
Since of the chunks in which the text gets split.
(chunk-text {:splitter/keys [unit] :as opts} text)
Chunk text
into chunk-size
blocks using specified splitter
. Optionaly
overlap
can be specified by how many text units chunks can overap (defaults to 0).
Supported text splitters:
sentence-splitter
character-splitter
token-splitter
Chunk `text` into `chunk-size` blocks using specified `splitter`. Optionaly `overlap` can be specified by how many text units chunks can overap (defaults to 0). Supported text splitters: - `sentence-splitter` - `character-splitter` - `token-splitter`
Number of units by which chunks can overlap.
Number of units by which chunks can overlap.
Text splitter by sentences. It will use OpenNLP sentnce splitter to partition the text.
Text splitter by sentences. It will use OpenNLP sentnce splitter to partition the text.
Split handlers are needed to turn text into specified text units via encode
function.
decode
function will turn those units back into single text string.
Split handlers are needed to turn text into specified text units via `encode` function. `decode` function will turn those units back into single text string.
Lexical units in which the text gets split: character, token, sentence.
Lexical units in which the text gets split: character, token, sentence.
(text-splitter {:splitter/keys [chunk-size overlap] :or {overlap 0}} text-units)
Text splitter by tokens. Tokenization is done based on the provided model.
Text splitter by tokens. Tokenization is done based on the provided model.
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close