(->vocabulary-top-n bows n)(bow->something-sparse ds bow-col indices-col bow->sparse-fn options)Converts a bag-of-word column bow-col to a sparse data column indices-col.
The exact transformation to the sparse representtaion is given by bow->sparse-fn
Converts a bag-of-word column `bow-col` to a sparse data column `indices-col`. The exact transformation to the sparse representtaion is given by `bow->sparse-fn`
(bow->sparse ds bow-col indices-col bow->sparse-fn vocabulary)(bow->sparse-and-vocab ds
                       bow-col
                       indices-col
                       bow->sparse-fn
                       {:keys [create-vocab-fn]
                        :or {create-vocab-fn create-vocab-all}})Converts a bag-of-word column bow-col to a sparse data column indices-col.
The exact transformation to the sparse representtaion is given by bow->sparse-fn
Converts a bag-of-word column `bow-col` to a sparse data column `indices-col`. The exact transformation to the sparse representtaion is given by `bow->sparse-fn`
(bow->sparse-indices bow vocab->index-map)Converts the token-frequencies to the sparse vectors needed by Maxent
Converts the token-frequencies to the sparse vectors needed by Maxent
(bow->tfidf ds bow-column tfidf-column)Calculates the tfidf score from bag-of-words (as token frequency maps)
in column bow-column and stores them in a new column tfid-column as maps of token->tfidf-score.
Calculates the tfidf score from bag-of-words (as token frequency maps) in column `bow-column` and stores them in a new column `tfid-column` as maps of token->tfidf-score.
(count-vectorize ds text-col bow-col)(count-vectorize
  ds
  text-col
  bow-col
  {:keys [text->bow-fn] :or {text->bow-fn default-text->bow} :as options})Converts text column text-col to bag-of-words representation
in the form of a frequency-count map
Converts text column `text-col` to bag-of-words representation in the form of a frequency-count map
(create-vocab-all bow)Uses all tokens as the vocabulary
Uses all tokens as the vocabulary
(default-text->bow text options)Converts text to token counts (a map token -> count).
Takes options:
stopwords being either a keyword naming a
default Smile dictionary (:default :google :comprehensive :mysql)
or a seq of stop words.
stemmer being either :none or :porter for selecting the porter stemmer.
Converts text to token counts (a map token -> count). Takes options: `stopwords` being either a keyword naming a default Smile dictionary (:default :google :comprehensive :mysql) or a seq of stop words. `stemmer` being either :none or :porter for selecting the porter stemmer.
(default-tokenize text options)Tokenizes text. The usage of a stemmer can be configured by options :stemmer
Tokenizes text. The usage of a stemmer can be configured by options :stemmer
(freqs->SparseArray freq-map vocab->index-map)(idf tf-map term bows)(resolve-stopwords stopwords-option)(tf term bow)(tf-map bows)(tfidf tf-map term bow bows)(word-process stemmer normalizer word)cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs | 
| ← | Move to previous article | 
| → | Move to next article | 
| Ctrl+/ | Jump to the search field |