Liking cljdoc? Tell your friends :D

tech.ml.dataset.text.bag-of-words


parse-token-columnclj

(parse-token-column token-table tokenizer col-data)
source

path->dataset-master-token-tableclj

(path->dataset-master-token-table path bag-of-words-colname)
(path->dataset-master-token-table path
                                  bag-of-words-colname
                                  {:keys [tokenizer]
                                   :or {tokenizer (simple-tokenizer-fn)}})

Parse a file returning a map of {:dataset :token-table} where token-table is a map of tokens to counts. Dataset has a sha-256-hash where the original text once was.

Parse a file returning a map of {:dataset :token-table} where token-table
is a map of tokens to counts.  Dataset has a sha-256-hash where the original
text once was.
sourceraw docstring

path-token-map->bag-of-wordsclj

(path-token-map->bag-of-words path bag-of-words-colname token->idx-map)
(path-token-map->bag-of-words path
                              bag-of-words-colname
                              token->idx-map
                              {:keys [tokenizer]
                               :or {tokenizer (simple-tokenizer-fn)}})
source

sha256clj

(sha256 string)
source

simple-tokenizer-fnclj

(simple-tokenizer-fn)
source

sum-bifunclj

source

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close