Text splitter by individual characters.
Text splitter by individual characters.
Since of the chunks in which the text gets split.
Since of the chunks in which the text gets split.
(chunk-text {:splitter/keys [unit] :as opts} text)Chunk text into chunk-size blocks using specified splitter. Optionaly
overlap can be specified by how many text units chunks can overap (defaults to 0).
Supported text splitters:
sentence-splittercharacter-splittertoken-splitterChunk `text` into `chunk-size` blocks using specified `splitter`. Optionaly `overlap` can be specified by how many text units chunks can overap (defaults to 0). Supported text splitters: - `sentence-splitter` - `character-splitter` - `token-splitter`
Number of units by which chunks can overlap.
Number of units by which chunks can overlap.
Text splitter by sentences. It will use OpenNLP sentnce splitter to partition the text.
Text splitter by sentences. It will use OpenNLP sentnce splitter to partition the text.
Split handlers are needed to turn text into specified text units via encode function.
decode function will turn those units back into single text string.
Split handlers are needed to turn text into specified text units via `encode` function. `decode` function will turn those units back into single text string.
Lexical units in which the text gets split: character, token, sentence.
Lexical units in which the text gets split: character, token, sentence.
(text-splitter {:splitter/keys [chunk-size overlap] :or {overlap 0}} text-units)Text splitter by tokens. Tokenization is done based on the provided model.
Text splitter by tokens. Tokenization is done based on the provided model.
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |