Liking cljdoc? Tell your friends :D

scicloj.ml.smile.nlp


->vocabulary-top-nclj

(->vocabulary-top-n bows n)
source

bow->something-sparseclj

(bow->something-sparse ds bow-col indices-col bow->sparse-fn options)

Converts a bag-of-word column bow-col to a sparse data column indices-col. The exact transformation to the sparse representtaion is given by bow->sparse-fn

Converts a bag-of-word column `bow-col` to a sparse data column `indices-col`.
The exact transformation to the sparse representtaion is given by `bow->sparse-fn`
sourceraw docstring

bow->sparseclj

(bow->sparse ds bow-col indices-col bow->sparse-fn vocabulary)
source

bow->sparse-and-vocabclj

(bow->sparse-and-vocab ds
                       bow-col
                       indices-col
                       bow->sparse-fn
                       {:keys [create-vocab-fn]
                        :or {create-vocab-fn create-vocab-all}})

Converts a bag-of-word column bow-col to a sparse data column indices-col. The exact transformation to the sparse representtaion is given by bow->sparse-fn

Converts a bag-of-word column `bow-col` to a sparse data column `indices-col`.
The exact transformation to the sparse representtaion is given by `bow->sparse-fn`
sourceraw docstring

bow->sparse-indicesclj

(bow->sparse-indices bow vocab->index-map)

Converts the token-frequencies to the sparse vectors needed by Maxent

Converts the token-frequencies to the sparse vectors
needed by Maxent
sourceraw docstring

bow->tfidfclj

(bow->tfidf ds bow-column tfidf-column)

Calculates the tfidf score from bag-of-words (as token frequency maps) in column bow-column and stores them in a new column tfid-column as maps of token->tfidf-score.

Calculates the tfidf score from bag-of-words (as token frequency maps)
in column `bow-column` and stores them in a new column `tfid-column` as maps of token->tfidf-score.
sourceraw docstring

count-vectorizeclj

(count-vectorize ds text-col bow-col)
(count-vectorize
  ds
  text-col
  bow-col
  {:keys [text->bow-fn] :or {text->bow-fn default-text->bow} :as options})

Converts text column text-col to bag-of-words representation in the form of a frequency-count map

Converts text column `text-col` to bag-of-words representation
in the form of a frequency-count map
sourceraw docstring

create-vocab-allclj

(create-vocab-all bow)

Uses all tokens as the vocabulary

Uses all tokens as the vocabulary
sourceraw docstring

default-text->bowclj

(default-text->bow text options)

Converts text to token counts (a map token -> count). Takes options: stopwords being either a keyword naming a default Smile dictionary (:default :google :comprehensive :mysql) or a seq of stop words. stemmer being either :none or :porter for selecting the porter stemmer.

Converts text to token counts (a map token -> count).
Takes options:
`stopwords` being either a keyword naming a
default Smile dictionary (:default :google :comprehensive :mysql)
or a seq of stop words.
`stemmer` being either :none or :porter for selecting the porter stemmer.
sourceraw docstring

default-tokenizeclj

(default-tokenize text options)

Tokenizes text. The usage of a stemmer can be configured by options :stemmer

Tokenizes text.
The usage of a stemmer can be configured by options :stemmer 
sourceraw docstring

freqs->SparseArrayclj

(freqs->SparseArray freq-map vocab->index-map)
source

idfclj

(idf tf-map term bows)
source

resolve-stopwordsclj

(resolve-stopwords stopwords-option)
source

tfclj

(tf term bow)
source

tf-mapclj

(tf-map bows)
source

tfidfclj

(tfidf tf-map term bow bows)
source

word-processclj

(word-process stemmer normalizer word)
source

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close