zensols — com.zensols.ml/dataset 0.0.12

zensols.dataset.db

Preemptively compute a dataset (i.e. features from natural language utterances) and store them in Elasticsearch. This is useful for use with training, testing, validating and development machine learning models.

The unit of data is an instance. An instance set (or just instances) makes up the dataset.

The idea is to abstract out Elasticsearch, but that might be a future enhancement. At the moment functions don't carry Elassticsearch artifacts but they are exposed.

There are three basic ways to use this data:

Get all instances (i.e. an utterance or a feature set). In this case all data returned from ids is considered training data. This is the default nascent state.
Split the data into a train and test set (see divide-by-set).
Use the data as a cross fold validation and iterate folds (see divide-by-fold).

The information used to represent either fold or the test/train split is referred to as the dataset split state and is stored in Elasticsearch under a differnent mapping-type in the same index as the instances.

See ids for more information.

Preemptively compute a dataset (i.e. features from natural language
utterances) and store them in Elasticsearch.  This is useful for use with
training, testing, validating and development machine learning models.

The unit of data is an instance.  An instance set (or just *instances*) makes
up the dataset.

The idea is to abstract out Elasticsearch, but that might be a future
enhancement.  At the moment functions don't carry Elassticsearch artifacts but
they are exposed.

There are three basic ways to use this data:

* Get all instances (i.e. an utterance or a feature set).  In this case all
  data returned from [[ids]] is considered training data.  This is the default
  nascent state.
* Split the data into a train and test set (see [[divide-by-set]]).
* Use the data as a cross fold validation and iterate
  folds (see [[divide-by-fold]]).

The information used to represent either fold or the test/train split is
referred to as the *dataset split* state and is stored in Elasticsearch under a
differnent mapping-type in the same index as the instances.

See [[ids]] for more information.

raw docstring

zensols.dataset.elsearch

A client simple wrapper for an Elasticsearch wrapper. You probably want use the more client friendly zensols.dataset.db.

A *client simple* wrapper for an Elasticsearch wrapper.  You
probably want use the more client friendly [[zensols.dataset.db]].

raw docstring

zensols.dataset.thaw

Exactly like zensols.dataset.db but use the file system.

Instead of using ElasticSearch, use a rows of a JSON file created with zensols.dataset.db/freeze-dataset. The file can be created by any program since it's just a text file with the following keys:

:instance: the (i.e. parsed) data instance (see zensols.dataset.db)
:class-label: label of the class for the data instance
:id: the string unique ID of the instance
:set-type: either train or test depending on the set type.

Exactly like [[zensols.dataset.db]] but use the file system.

Instead of using ElasticSearch, use a rows of a JSON file created
with [[zensols.dataset.db/freeze-dataset]].  The file can be created
by any program since it's just a text file with the following keys:

* **:instance**: the (i.e. parsed) data instance (see [[zensols.dataset.db]])
* **:class-label**: label of the class for the data instance
* **:id**: the string unique ID of the instance
* **:set-type**: either `train` or `test` depending on the set type.

raw docstring

zensols.dataset.db

zensols.dataset.elsearch

zensols.dataset.thaw

zensols.dataset.version