tech — techascent/tech.ml.dataset 2.11

Dealing with categorical dataset data involves having two mapping systems. The first is a map of category to integer within the same column. The second is a 'one-hot' encoding where you generate more columns but those have a reduced number of possible categories, usually one categorical value per column.

Dealing with categorical dataset data involves having two mapping systems.
The first is a map of category to integer within the same column.
The second is a 'one-hot' encoding where you generate more columns but those have
a reduced number of possible categories, usually one categorical value per
column.

raw docstring

tech.ml.dataset.column

tech.ml.dataset.dynamic-int-list

An int-list implementation that resizes its backing store as it is required to hold wider data.

An int-list implementation that resizes its backing store as it is required to hold
wider data.

raw docstring

tech.ml.dataset.format-sequence

This code provided initial by genmeblog after careful consideration of R print code

This code provided initial by genmeblog after careful consideration
of R print code

raw docstring

tech.ml.dataset.impl.column

tech.ml.dataset.impl.dataset

tech.ml.dataset.join

tech.ml.dataset.math

tech.ml.dataset.modelling

tech.ml.dataset.options

The etl pipeline and dataset operators are built to produce a metadata options map. Their API access to the options is centralized in this file.

The etl pipeline and dataset operators are built to produce a metadata options map.
Their API access to the options is centralized in this file.

raw docstring

tech.ml.dataset.parallel-unique

parallel-unique

tech.ml.dataset.parse

This file really should be named univocity.clj. But it is for parsing and writing csv and tsv data.

This file really should be named univocity.clj.  But it is for parsing and writing
csv and tsv data.

raw docstring

tech.ml.dataset.parse.datetime

tech.ml.dataset.parse.mapseq

Sequences of maps are maybe the most basic pure datastructure for data. Converting them into a more structured form (and back) is a key component of dealing with datatets

Sequences of maps are maybe the most basic pure datastructure for data.
Converting them into a more structured form (and back) is a key component of
dealing with datatets

raw docstring

tech.ml.dataset.parse.name-values-seq

tech.ml.dataset.parse.spreadsheet

Spreadsheets in general are stored in a cell-based format. This means that any cell could have data of any type. Commonalities around parsing spreadsheet-type systems are captured here.

Spreadsheets in general are stored in a cell-based format.  This means that any cell
could have data of any type.  Commonalities around parsing spreadsheet-type systems
are captured here.

raw docstring

tech.ml.dataset.pca

PCA and K-PCA using smile implementations.

PCA and K-PCA using smile implementations.

raw docstring

tech.ml.dataset.pipeline

A set of common 'pipeline' operations you probably will want to run on a dataset.

A set of common 'pipeline' operations you probably will want to run on a dataset.

raw docstring

Conversion mechanisms from dataset to tensor and back

raw docstring

set-log-level

tech.libs.fastexcel

tech.libs.poi

tech.libs.smile.data

tech.ml.dataset

tech.ml.dataset.base

tech.ml.dataset.categorical

tech.ml.dataset.column

tech.ml.dataset.dynamic-int-list

tech.ml.dataset.format-sequence

tech.ml.dataset.impl.column

tech.ml.dataset.impl.dataset

tech.ml.dataset.join

tech.ml.dataset.math

tech.ml.dataset.modelling

tech.ml.dataset.options

tech.ml.dataset.parallel-unique

tech.ml.dataset.parse

tech.ml.dataset.parse.datetime

tech.ml.dataset.parse.mapseq

tech.ml.dataset.parse.name-values-seq

tech.ml.dataset.parse.spreadsheet

tech.ml.dataset.pca

tech.ml.dataset.pipeline

tech.ml.dataset.pipeline.base

tech.ml.dataset.pipeline.column-filters

tech.ml.dataset.pipeline.pipeline-operators

tech.ml.dataset.print

tech.ml.dataset.readers

tech.ml.dataset.string-table

tech.ml.dataset.svm

tech.ml.dataset.tensor

tech.ml.dataset.text

tech.ml.dataset.text.bag-of-words

tech.ml.protocols.column

tech.ml.protocols.dataset

tech.ml.protocols.etl

tech.ml.utils

tech.ml.utils.slf4j-log-level