This namesapce provides ways of filtering stop word tokens.
To avoid the double negative in function names, go words are defined to be
the compliment of a vocabulary with a stop word list. Functions
like go-word?
tell whether or not a token is a stop word, which are
defined to be:
This namesapce provides ways of filtering *stop word* tokens. To avoid the double negative in function names, *go words* are defined to be the compliment of a vocabulary with a stop word list. Functions like [[go-word?]] tell whether or not a token is a stop word, which are defined to be: * stopwords (predefined list) * punctuation * numbers * non-alphabetic characters
Configuration for filtering stop words.
go-word-form
; for example
if #(-> % :lemma s/lower-case)
is given then lemmatization is
used (i.e. Running -> run)Configuration for filtering stop words. Keys --- * **:post-tags** POS tags for *go words* (see namespace docs) * **:word-form-fn** function run on the token in [[go-word-form]]; for example if `#(-> % :lemma s/lower-case)` is given then lemmatization is used (i.e. Running -> run)
(go-word-form token)
Conical string word count form of a token.
Conical string word count form of a token.
(go-word-forms tokens)
Filter tokens per go-word?
and return their form
based on go-word-form
.
Filter tokens per [[go-word?]] and return their *form* based on [[go-word-form]].
(go-word? token)
Return whether a token is a go token.
Return whether a token is a *go* token.
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close