zensols.dataset.db

Liking cljdoc? Tell your friends :D

Mostly clj.
Exceptions indicated.

Preemptively compute a dataset (i.e. features from natural language utterances) and store them in Elasticsearch. This is useful for use with training, testing, validating and development machine learning models.

The unit of data is an instance. An instance set (or just instances) makes up the dataset.

The idea is to abstract out Elasticsearch, but that might be a future enhancement. At the moment functions don't carry Elassticsearch artifacts but they are exposed.

There are three basic ways to use this data:

Get all instances (i.e. an utterance or a feature set). In this case all data returned from ids is considered training data. This is the default nascent state.
Split the data into a train and test set (see divide-by-set).
Use the data as a cross fold validation and iterate folds (see divide-by-fold).

The information used to represent either fold or the test/train split is referred to as the dataset split state and is stored in Elasticsearch under a differnent mapping-type in the same index as the instances.

See ids for more information.

Preemptively compute a dataset (i.e. features from natural language
utterances) and store them in Elasticsearch.  This is useful for use with
training, testing, validating and development machine learning models.

The unit of data is an instance.  An instance set (or just *instances*) makes
up the dataset.

The idea is to abstract out Elasticsearch, but that might be a future
enhancement.  At the moment functions don't carry Elassticsearch artifacts but
they are exposed.

There are three basic ways to use this data:

* Get all instances (i.e. an utterance or a feature set).  In this case all
  data returned from [[ids]] is considered training data.  This is the default
  nascent state.
* Split the data into a train and test set (see [[divide-by-set]]).
* Use the data as a cross fold validation and iterate
  folds (see [[divide-by-fold]]).

The information used to represent either fold or the test/train split is
referred to as the *dataset split* state and is stored in Elasticsearch under a
differnent mapping-type in the same index as the instances.

See [[ids]] for more information.

zensols.dataset.db

class-label-keyclj

clearclj

dataset-fileclj

default-connection-instclj

distributionclj

divide-by-foldclj

divide-by-presetclj

divide-by-setclj

elasticsearch-connectionclj

freeze-datasetclj

freeze-dataset-to-writerclj

freeze-fileclj

id-keyclj

idsclj

instance-by-idclj

instance-countclj

instance-keyclj

instancesclj

instances-by-class-labelclj

instances-countclj

instances-loadclj

set-default-connectionclj

set-default-set-typeclj

set-foldclj

set-population-useclj

statsclj

with-connectionclj/smacro

write-datasetclj

class-label-key^clj

clear^clj

dataset-file^clj

default-connection-inst^clj

distribution^clj

divide-by-fold^clj

divide-by-preset^clj

divide-by-set^clj

elasticsearch-connection^clj

freeze-dataset^clj

freeze-dataset-to-writer^clj

freeze-file^clj

id-key^clj

ids^clj

instance-by-id^clj

instance-count^clj

instance-key^clj

instances^clj

instances-by-class-label^clj

instances-count^clj

instances-load^clj

set-default-connection^clj

set-default-set-type^clj

set-fold^clj

set-population-use^clj

stats^clj

with-connection^clj/smacro

write-dataset^clj