(compute-centroid-and-global-means dataset row-major-centroids)
Return a map of: centroid-means - centroid-index -> (double array) column means. global-means - global means (double array) for the dataset.
Return a map of: centroid-means - centroid-index -> (double array) column means. global-means - global means (double array) for the dataset.
(correlation-table dataset & {:keys [correlation-type colname-seq]})
Return a map of colname->list of sorted tuple of [colname, coefficient]. Sort is: (sort-by (comp #(Math/abs (double %)) second) >)
Thus the first entry is: [colname, 1.0]
There are three possible correlation types: :pearson :spearman :kendall
:pearson is the default.
Return a map of colname->list of sorted tuple of [colname, coefficient]. Sort is: (sort-by (comp #(Math/abs (double %)) second) >) Thus the first entry is: [colname, 1.0] There are three possible correlation types: :pearson :spearman :kendall :pearson is the default.
(g-means dataset & [max-k error-on-missing?])
g-means. Not NAN aware, missing is an error. Returns array of centroids in row-major array-of-array-of-doubles format.
g-means. Not NAN aware, missing is an error. Returns array of centroids in row-major array-of-array-of-doubles format.
(group-rows-by-nearest-centroid dataset
row-major-centroids
&
[error-on-missing?])
(impute-missing-by-centroid-averages dataset
row-major-centroids
{:keys [centroid-means global-means]})
Impute missing columns by first grouping by nearest centroids and then computing the mean. In the case where the grouping for a given centroid contains all NaN's, use the global dataset mean. In the case where this is NaN, this algorithm will fail to replace the missing values with meaningful values. Return a new dataset.
Impute missing columns by first grouping by nearest centroids and then computing the mean. In the case where the grouping for a given centroid contains all NaN's, use the global dataset mean. In the case where this is NaN, this algorithm will fail to replace the missing values with meaningful values. Return a new dataset.
(interpolate-loess ds x-colname y-colname)
(interpolate-loess ds
x-colname
y-colname
{:keys [bandwidth iterations accuracy result-name]
:or {bandwidth 0.75
iterations 4
accuracy LoessInterpolator/DEFAULT_ACCURACY}})
Interpolate using the LOESS regression engine. Useful for smoothing out graphs.
Interpolate using the LOESS regression engine. Useful for smoothing out graphs.
(k-means dataset & [k max-iterations num-runs error-on-missing? tolerance])
Nan-aware k-means. Returns array of centroids in row-major array-of-array-of-doubles format.
Nan-aware k-means. Returns array of centroids in row-major array-of-array-of-doubles format.
(nan-aware-squared-distance lhs rhs)
Nan away squared distance.
Nan away squared distance.
(to-column-major-double-array-of-arrays dataset & [error-on-missing?])
Convert a dataset to a row major array of arrays. Note that if error-on-missing is false, missing values will appear as NAN.
Convert a dataset to a row major array of arrays. Note that if error-on-missing is false, missing values will appear as NAN.
(to-row-major-double-array-of-arrays dataset & [error-on-missing?])
Convert a dataset to a column major array of arrays. Note that if error-on-missing is false, missing values will appear as NAN.
Convert a dataset to a column major array of arrays. Note that if error-on-missing is false, missing values will appear as NAN.
(x-means dataset & [max-k error-on-missing?])
x-means. Not NAN aware, missing is an error. Returns array of centroids in row-major array-of-array-of-doubles format.
x-means. Not NAN aware, missing is an error. Returns array of centroids in row-major array-of-array-of-doubles format.
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close