scicloj.ml.smile.nlp

Liking cljdoc? Tell your friends :D

Clojure only.

->vocabulary-top-n
bow->something-sparse
bow->sparse
bow->sparse-and-vocab
bow->sparse-indices
bow->tfidf
count-vectorize
create-vocab-all
default-text->bow
default-tokenize
freqs->SparseArray
idf
resolve-stopwords
tf
tf-map
tfidf
word-process

->vocabulary-top-n^clj

(->vocabulary-top-n bows n)

bow->something-sparse^clj

(bow->something-sparse ds bow-col indices-col bow->sparse-fn options)

Converts a bag-of-word column bow-col to a sparse data column indices-col. The exact transformation to the sparse representtaion is given by bow->sparse-fn

Converts a bag-of-word column `bow-col` to a sparse data column `indices-col`.
The exact transformation to the sparse representtaion is given by `bow->sparse-fn`

raw docstring

bow->sparse^clj

(bow->sparse ds bow-col indices-col bow->sparse-fn vocabulary)

bow->sparse-and-vocab^clj

(bow->sparse-and-vocab ds
                       bow-col
                       indices-col
                       bow->sparse-fn
                       {:keys [create-vocab-fn]
                        :or {create-vocab-fn create-vocab-all}})

Converts a bag-of-word column bow-col to a sparse data column indices-col. The exact transformation to the sparse representtaion is given by bow->sparse-fn

Converts a bag-of-word column `bow-col` to a sparse data column `indices-col`.
The exact transformation to the sparse representtaion is given by `bow->sparse-fn`

raw docstring

bow->sparse-indices^clj

(bow->sparse-indices bow vocab->index-map)

Converts the token-frequencies to the sparse vectors needed by Maxent

Converts the token-frequencies to the sparse vectors
needed by Maxent

raw docstring

bow->tfidf^clj

(bow->tfidf ds bow-column tfidf-column)

Calculates the tfidf score from bag-of-words (as token frequency maps) in column bow-column and stores them in a new column tfid-column as maps of token->tfidf-score.

Calculates the tfidf score from bag-of-words (as token frequency maps)
in column `bow-column` and stores them in a new column `tfid-column` as maps of token->tfidf-score.

raw docstring

count-vectorize^clj

(count-vectorize ds text-col bow-col)

(count-vectorize
  ds
  text-col
  bow-col
  {:keys [text->bow-fn] :or {text->bow-fn default-text->bow} :as options})

Converts text column text-col to bag-of-words representation in the form of a frequency-count map

Converts text column `text-col` to bag-of-words representation
in the form of a frequency-count map

raw docstring

create-vocab-all^clj

(create-vocab-all bow)

Uses all tokens as the vocabulary

Uses all tokens as the vocabulary

raw docstring

default-text->bow^clj

(default-text->bow text options)

Converts text to token counts (a map token -> count). Takes options: stopwords being either a keyword naming a default Smile dictionary (:default :google :comprehensive :mysql) or a seq of stop words. stemmer being either :none or :porter for selecting the porter stemmer.

Converts text to token counts (a map token -> count).
Takes options:
`stopwords` being either a keyword naming a
default Smile dictionary (:default :google :comprehensive :mysql)
or a seq of stop words.
`stemmer` being either :none or :porter for selecting the porter stemmer.

raw docstring

default-tokenize^clj

(default-tokenize text options)

Tokenizes text. The usage of a stemmer can be configured by options :stemmer

Tokenizes text.
The usage of a stemmer can be configured by options :stemmer

raw docstring

freqs->SparseArray^clj

(freqs->SparseArray freq-map vocab->index-map)

idf^clj

(idf tf-map term bows)

resolve-stopwords^clj

(resolve-stopwords stopwords-option)

tf^clj

(tf term bow)

tf-map^clj

(tf-map bows)

tfidf^clj

(tfidf tf-map term bow bows)

word-process^clj

(word-process stemmer normalizer word)

cljdoc is a website building & hosting documentation for Clojure/Script libraries

Keyboard shortcuts Report a problem cljdoc on GitHub

× close