Preemptively compute a dataset (i.e. features from natural language utterances) and store them in Elasticsearch. This is useful for use with training, testing, validating and development machine learning models.
The unit of data is an instance. An instance set (or just instances) makes up the dataset.
The idea is to abstract out Elasticsearch, but that might be a future enhancement. At the moment functions don't carry Elassticsearch artifacts but they are exposed.
There are three basic ways to use this data:
ids
is considered training data. This is the default
nascent state.divide-by-set
).divide-by-fold
).The information used to represent either fold or the test/train split is referred to as the dataset split state and is stored in Elasticsearch under a differnent mapping-type in the same index as the instances.
See ids
for more information.
Preemptively compute a dataset (i.e. features from natural language utterances) and store them in Elasticsearch. This is useful for use with training, testing, validating and development machine learning models. The unit of data is an instance. An instance set (or just *instances*) makes up the dataset. The idea is to abstract out Elasticsearch, but that might be a future enhancement. At the moment functions don't carry Elassticsearch artifacts but they are exposed. There are three basic ways to use this data: * Get all instances (i.e. an utterance or a feature set). In this case all data returned from [[ids]] is considered training data. This is the default nascent state. * Split the data into a train and test set (see [[divide-by-set]]). * Use the data as a cross fold validation and iterate folds (see [[divide-by-fold]]). The information used to represent either fold or the test/train split is referred to as the *dataset split* state and is stored in Elasticsearch under a differnent mapping-type in the same index as the instances. See [[ids]] for more information.
A client simple wrapper for an Elasticsearch wrapper. You
probably want use the more client friendly zensols.dataset.db
.
A *client simple* wrapper for an Elasticsearch wrapper. You probably want use the more client friendly [[zensols.dataset.db]].
Exactly like zensols.dataset.db
but use the file system.
Instead of using ElasticSearch, use a rows of a JSON file created
with zensols.dataset.db/freeze-dataset
. The file can be created
by any program since it's just a text file with the following keys:
zensols.dataset.db
)train
or test
depending on the set type.Exactly like [[zensols.dataset.db]] but use the file system. Instead of using ElasticSearch, use a rows of a JSON file created with [[zensols.dataset.db/freeze-dataset]]. The file can be created by any program since it's just a text file with the following keys: * **:instance**: the (i.e. parsed) data instance (see [[zensols.dataset.db]]) * **:class-label**: label of the class for the data instance * **:id**: the string unique ID of the instance * **:set-type**: either `train` or `test` depending on the set type.
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close