tech.ml.dataset.math

Liking cljdoc? Tell your friends :D

Clojure only.

compute-centroid-and-global-means
correlation-table
find-static
g-means
group-rows-by-nearest-centroid
impute-missing-by-centroid-averages
interpolate-loess
k-means
nan-aware-mean
nan-aware-squared-distance
to-column-major-double-array-of-arrays
to-row-major-double-array-of-arrays
transpose-double-array-of-arrays
x-means

compute-centroid-and-global-means^clj

(compute-centroid-and-global-means dataset row-major-centroids)

Return a map of: centroid-means - centroid-index -> (double array) column means. global-means - global means (double array) for the dataset.

Return a map of:
centroid-means - centroid-index -> (double array) column means.
global-means - global means (double array) for the dataset.

source raw docstring

correlation-table^clj

(correlation-table dataset & {:keys [correlation-type colname-seq]})

Return a map of colname->list of sorted tuple of [colname, coefficient]. Sort is: (sort-by (comp #(Math/abs (double %)) second) >)

Thus the first entry is: [colname, 1.0]

There are three possible correlation types: :pearson :spearman :kendall

:pearson is the default.

Return a map of colname->list of sorted tuple of [colname, coefficient].
Sort is:
(sort-by (comp #(Math/abs (double %)) second) >)

Thus the first entry is:
[colname, 1.0]

There are three possible correlation types:
:pearson
:spearman
:kendall

:pearson is the default.

source raw docstring

find-static^clj

source

g-means^clj

(g-means dataset & [max-k error-on-missing?])

g-means. Not NAN aware, missing is an error. Returns array of centroids in row-major array-of-array-of-doubles format.

g-means. Not NAN aware, missing is an error.
Returns array of centroids in row-major array-of-array-of-doubles format.

source raw docstring

group-rows-by-nearest-centroid^clj

(group-rows-by-nearest-centroid dataset
                                row-major-centroids
                                &
                                [error-on-missing?])

source

impute-missing-by-centroid-averages^clj

(impute-missing-by-centroid-averages dataset
                                     row-major-centroids
                                     {:keys [centroid-means global-means]})

Impute missing columns by first grouping by nearest centroids and then computing the mean. In the case where the grouping for a given centroid contains all NaN's, use the global dataset mean. In the case where this is NaN, this algorithm will fail to replace the missing values with meaningful values. Return a new dataset.

Impute missing columns by first grouping by nearest centroids and then computing the
mean.  In the case where the grouping for a given centroid contains all NaN's, use the
global dataset mean.  In the case where this is NaN, this algorithm will fail to
replace the missing values with meaningful values.  Return a new dataset.

source raw docstring

interpolate-loess^clj

(interpolate-loess ds x-colname y-colname)

(interpolate-loess ds
                   x-colname
                   y-colname
                   {:keys [bandwidth iterations accuracy result-name]
                    :or {bandwidth 0.75
                         iterations 4
                         accuracy LoessInterpolator/DEFAULT_ACCURACY}})

Interpolate using the LOESS regression engine. Useful for smoothing out graphs.

Interpolate using the LOESS regression engine.  Useful for smoothing out graphs.

source raw docstring

k-means^clj

(k-means dataset & [k max-iterations num-runs error-on-missing? tolerance])

Nan-aware k-means. Returns array of centroids in row-major array-of-array-of-doubles format.

Nan-aware k-means.
Returns array of centroids in row-major array-of-array-of-doubles format.

source raw docstring

nan-aware-mean^clj

(nan-aware-mean col-data)

source

nan-aware-squared-distance^clj

(nan-aware-squared-distance lhs rhs)

Nan away squared distance.

Nan away squared distance.

source raw docstring

to-column-major-double-array-of-arrays^clj

(to-column-major-double-array-of-arrays dataset & [error-on-missing?])

Convert a dataset to a row major array of arrays. Note that if error-on-missing is false, missing values will appear as NAN.

Convert a dataset to a row major array of arrays.
Note that if error-on-missing is false, missing values will appear as NAN.

source raw docstring

to-row-major-double-array-of-arrays^clj

(to-row-major-double-array-of-arrays dataset & [error-on-missing?])

Convert a dataset to a column major array of arrays. Note that if error-on-missing is false, missing values will appear as NAN.

Convert a dataset to a column major array of arrays.
Note that if error-on-missing is false, missing values will appear as NAN.

source raw docstring

transpose-double-array-of-arrays^clj

(transpose-double-array-of-arrays input-data)

source

x-means^clj

(x-means dataset & [max-k error-on-missing?])

x-means. Not NAN aware, missing is an error. Returns array of centroids in row-major array-of-array-of-doubles format.

x-means. Not NAN aware, missing is an error.
Returns array of centroids in row-major array-of-array-of-doubles format.

source raw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

Keyboard shortcuts Report a problem cljdoc on GitHub

× close