Column major dataset abstraction for efficiently manipulating in memory datasets.
Column major dataset abstraction for efficiently manipulating in memory datasets.
Dealing with categorical dataset data involves having two mapping systems. The first is a map of category to integer within the same column. The second is a 'one-hot' encoding where you generate more columns but those have a reduced number of possible categories, usually one categorical value per column.
Dealing with categorical dataset data involves having two mapping systems. The first is a map of category to integer within the same column. The second is a 'one-hot' encoding where you generate more columns but those have a reduced number of possible categories, usually one categorical value per column.
An int-list implementation that resizes its backing store as it is required to hold wider data.
An int-list implementation that resizes its backing store as it is required to hold wider data.
This code provided initial by genmeblog after careful consideration of R print code
This code provided initial by genmeblog after careful consideration of R print code
The etl pipeline and dataset operators are built to produce a metadata options map. Their API access to the options is centralized in this file.
The etl pipeline and dataset operators are built to produce a metadata options map. Their API access to the options is centralized in this file.
This file really should be named univocity.clj. But it is for parsing and writing csv and tsv data.
This file really should be named univocity.clj. But it is for parsing and writing csv and tsv data.
Sequences of maps are maybe the most basic pure datastructure for data. Converting them into a more structured form (and back) is a key component of dealing with datatets
Sequences of maps are maybe the most basic pure datastructure for data. Converting them into a more structured form (and back) is a key component of dealing with datatets
Spreadsheets in general are stored in a cell-based format. This means that any cell could have data of any type. Commonalities around parsing spreadsheet-type systems are captured here.
Spreadsheets in general are stored in a cell-based format. This means that any cell could have data of any type. Commonalities around parsing spreadsheet-type systems are captured here.
PCA and K-PCA using smile implementations.
PCA and K-PCA using smile implementations.
A set of common 'pipeline' operations you probably will want to run on a dataset.
A set of common 'pipeline' operations you probably will want to run on a dataset.
Conversion mechanisms from dataset to tensor and back
Conversion mechanisms from dataset to tensor and back
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close