(->vocabulary-top-n bows n)
(bow->something-sparse ds bow-col indices-col bow->sparse-fn options)
Converts a bag-of-word column bow-col
to a sparse data column indices-col
.
The exact transformation to the sparse representtaion is given by bow->sparse-fn
Converts a bag-of-word column `bow-col` to a sparse data column `indices-col`. The exact transformation to the sparse representtaion is given by `bow->sparse-fn`
(bow->sparse ds bow-col indices-col bow->sparse-fn vocabulary)
(bow->sparse-and-vocab ds
bow-col
indices-col
bow->sparse-fn
{:keys [create-vocab-fn]
:or {create-vocab-fn create-vocab-all}})
Converts a bag-of-word column bow-col
to a sparse data column indices-col
.
The exact transformation to the sparse representtaion is given by bow->sparse-fn
Converts a bag-of-word column `bow-col` to a sparse data column `indices-col`. The exact transformation to the sparse representtaion is given by `bow->sparse-fn`
(bow->sparse-indices bow vocab->index-map)
Converts the token-frequencies to the sparse vectors needed by Maxent
Converts the token-frequencies to the sparse vectors needed by Maxent
(bow->tfidf ds bow-column tfidf-column)
Calculates the tfidf score from bag-of-words (as token frequency maps)
in column bow-column
and stores them in a new column tfid-column
as maps of token->tfidf-score.
Calculates the tfidf score from bag-of-words (as token frequency maps) in column `bow-column` and stores them in a new column `tfid-column` as maps of token->tfidf-score.
(count-vectorize ds text-col bow-col)
(count-vectorize
ds
text-col
bow-col
{:keys [text->bow-fn] :or {text->bow-fn default-text->bow} :as options})
Converts text column text-col
to bag-of-words representation
in the form of a frequency-count map
Converts text column `text-col` to bag-of-words representation in the form of a frequency-count map
(create-vocab-all bow)
Uses all tokens as the vocabulary
Uses all tokens as the vocabulary
(default-text->bow text options)
Converts text to token counts (a map token -> count).
Takes options:
stopwords
being either a keyword naming a
default Smile dictionary (:default :google :comprehensive :mysql)
or a seq of stop words.
stemmer
being either :none or :porter for selecting the porter stemmer.
Converts text to token counts (a map token -> count). Takes options: `stopwords` being either a keyword naming a default Smile dictionary (:default :google :comprehensive :mysql) or a seq of stop words. `stemmer` being either :none or :porter for selecting the porter stemmer.
(default-tokenize text options)
Tokenizes text. The usage of a stemmer can be configured by options :stemmer
Tokenizes text. The usage of a stemmer can be configured by options :stemmer
(freqs->SparseArray freq-map vocab->index-map)
(idf tf-map term bows)
(resolve-stemmer options)
(resolve-stopwords stopwords-option)
(tf term bow)
(tf-map bows)
(tfidf tf-map term bow bows)
(word-process stemmer normalizer word)
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close