Clustering.
Various clustering algrorithms backed by SMILE library.
Only partition clustering is implemented.
It's always sequence of n-sized samples as sequences.
For example, 2d samples [[1 2] [2 2] [3 3] ...]
For 1d data you can pass sequence of numbers of sequence of 1d seqs of numbers
[1 2 3]
;; or
[[1] [2] [3]]
Some of the methods use distance functions, use fastmath.distance
namespace to create one.
Every function returns record which contains:
:type
- name of the method used:data
- input data:clustering
- sequence of cluster ids:sizes
- sizes of clusters:clusters
- number of clusters:predict
- predicting function (see below), qualify additional sample:representatives
- list of centroids or averages:info
- additional statistics for your samples (like distortion):obj
- SMILE objectCluster id is a integer ranging from 0 to the number of clusters minus 1. Some methods mark outliers with outlier-id
.
Record acts as function and can qualify additonal sample by calling :predict
function (or just call predict
), for example (data
is sequence of 3d samples):
(let [cl (k-means data 10)] (cl [0 1 2]))
See k-means
Clustering record can be regroupped to the list of individual clusters. Call regroup
and get list of maps with following structure:
:key
- cluster id or :outliers
:data
- samples which belong to the cluster:representative
- centroid or average vector if the former is not available:size
- size of clusterClustering. Various clustering algrorithms backed by SMILE library. Only partition clustering is implemented. ### Input data It's always sequence of n-sized samples as sequences. For example, 2d samples `[[1 2] [2 2] [3 3] ...]` For 1d data you can pass sequence of numbers of sequence of 1d seqs of numbers ```clojure [1 2 3] ;; or [[1] [2] [3]] ``` ### Distances Some of the methods use distance functions, use [[fastmath.distance]] namespace to create one. ### Output Every function returns record which contains: * `:type` - name of the method used * `:data` - input data * `:clustering` - sequence of cluster ids * `:sizes` - sizes of clusters * `:clusters` - number of clusters * `:predict` - predicting function (see below), qualify additional sample * `:representatives` - list of centroids or averages * `:info` - additional statistics for your samples (like distortion) * `:obj` - SMILE object Cluster id is a integer ranging from 0 to the number of clusters minus 1. Some methods mark outliers with [[outlier-id]]. Record acts as function and can qualify additonal sample by calling `:predict` function (or just call [[predict]]), for example (`data` is sequence of 3d samples): ```clojure (let [cl (k-means data 10)] (cl [0 1 2])) ``` See [[k-means]] #### Regrouping Clustering record can be regroupped to the list of individual clusters. Call [[regroup]] and get list of maps with following structure: * `:key` - cluster id or `:outliers` * `:data` - samples which belong to the cluster * `:representative` - centroid or average vector if the former is not available * `:size` - size of cluster
(clarans data clusters)
(clarans data dist clusters)
(clarans data dist clusters max-neighbor)
Clustering Large Applications based upon RANdomized Search algorithm.
Input:
euclidean
See more in SMILE doc
Clustering Large Applications based upon RANdomized Search algorithm. Input: * data - sequence of samples * dist (optional) - distance method, default `euclidean` * clusters - number of clusters * max-neighbor (optional) - maximum number of neighbors checked during random search See more in [SMILE doc](https://haifengl.github.io/api/java/smile/clustering/CLARANS.html)
List of clustering methods.
List of clustering methods.
(dbscan data min-pts radius)
(dbscan data dist min-pts radius)
Density-Based Spatial Clustering of Applications with Noise algorithm.
Input:
euclidean
See more in SMILE doc
Density-Based Spatial Clustering of Applications with Noise algorithm. Input: * data - sequence of samples * dist (optional) - distance method, default `euclidean` * min-pts - minimum number of neighbors * radius - the neighborhood radius See more in [SMILE doc](https://haifengl.github.io/api/java/smile/clustering/DBSCAN.html)
(denclue data sigma m)
(denclue data sigma m tolerance)
(denclue data sigma m tolerance min-pts)
DENsity CLUstering algorithm.
Input:
See more in SMILE doc
DENsity CLUstering algorithm. Input: * data - sequence of samples * sigma - gaussian kernel parameter * m - number of selected samples, much smaller than number of all samples * tolerance (optional) - tolerance of hill-climbing procedure * min-pts (optional) - minimum number of neighbors for a core attractor See more in [SMILE doc](https://haifengl.github.io/api/java/smile/clustering/DENCLUE.html)
(deterministic-annealing data max-clusters)
(deterministic-annealing data max-clusters alpha)
(deterministic-annealing data max-clusters alpha max-iter)
(deterministic-annealing data max-clusters alpha max-iter tolerance)
(deterministic-annealing data
max-clusters
alpha
max-iter
tolerance
split-tolerance)
Deterministic Annealing algorithm.
Input:
See more in SMILE doc
Deterministic Annealing algorithm. Input: * data - sequence of samples * max-clusters - number of clusters * alpha (optional) - temperature decreasing factor (valued from 0 to 1) * max-iter (optional) - maximum number of iterations * tolerance (optional) - tolerance of convergence test * split-tolerance (optional) - tolerance to split a cluster See more in [SMILE doc](https://haifengl.github.io/api/java/smile/clustering/DeterministicAnnealing.html)
(g-means data clusters)
(g-means data clusters max-iter)
(g-means data clusters max-iter tolerance)
G-Means
Input:
See more in SMILE doc
G-Means Input: * data - sequence of samples * clusters - number of clusters * max-iter (optional) - maximum number of iterations * tolerance (optional) - tolerance of convergence test See more in [SMILE doc](https://haifengl.github.io/api/java/smile/clustering/GMeans.html)
(k-means data clusters)
(k-means data clusters max-iter)
(k-means data clusters max-iter tolerance)
K-Means++ algorithm.
Input:
See more in SMILE doc
K-Means++ algorithm. Input: * data - sequence of samples * clusters - number of clusters * max-iter (optional) - maximum number of iterations * tolerance (optional) - tolerance of convergence test See more in [SMILE doc](https://haifengl.github.io/api/java/smile/clustering/KMeans.html)
(lloyd data clusters)
(lloyd data clusters max-iter)
(lloyd data clusters max-iter tolerance)
K-Means algorithm, lloyd.
Input:
See more in SMILE doc
K-Means algorithm, lloyd. Input: * data - sequence of samples * clusters - number of clusters * max-iter (optional) - maximum number of iterations * tolerance (optional) - tolerance of convergence test See more in [SMILE doc](https://haifengl.github.io/api/java/smile/clustering/KMeans.html)
(mec data max-clusters radius)
(mec data dist max-clusters radius)
Nonparametric Minimum Conditional Entropy Clustering algorithm.
Input:
:euclidean
See more in SMILE doc
Nonparametric Minimum Conditional Entropy Clustering algorithm. Input: * data - sequence of samples * dist (optional) - distance method, default `:euclidean` * max-clusters - maximum number of clusters * radius - the neighborhood radius See more in [SMILE doc](https://haifengl.github.io/api/java/smile/clustering/MEC.html)
Id of the cluster which contain outliers.
Id of the cluster which contain outliers.
(predict cluster in)
Predict cluster for given vector
Predict cluster for given vector
(regroup {:keys [clustering data representatives sizes]})
Transform clustering result into list of clusters as separate maps.
Every map contain:
:key
- cluster id or :outliers
:data
- samples which belong to the cluster:representative
- centroid/medoid or average vector if the former is not available:size
- size of clusterRepresentative is always a n-dimensional sequence even if input is a list of numbers.
Empty clusters are skipped.
Transform clustering result into list of clusters as separate maps. Every map contain: * `:key` - cluster id or `:outliers` * `:data` - samples which belong to the cluster * `:representative` - centroid/medoid or average vector if the former is not available * `:size` - size of cluster Representative is always a n-dimensional sequence even if input is a list of numbers. Empty clusters are skipped.
(spectral data clusters sigma)
(spectral data clusters samples sigma)
(spectral data clusters sigma max-iters tolerance)
(spectral data clusters samples sigma max-iters tolerance)
Spectral clustering
Input:
See more in SMILE doc
Spectral clustering Input: * data - sequence of samples * clusters - number of clusters * samples (optional) - number of random samples for Nystrom approximation * sigma - width parameter for Gaussian kernel * max-iter (optional) - maximum number of iterations * tolerance (optional) - tolerance of k-means convergence test See more in [SMILE doc](https://haifengl.github.io/api/java/smile/clustering/SpectralClustering.html)
(x-means data clusters)
(x-means data clusters max-iter)
(x-means data clusters max-iter tolerance)
X-Means
Input:
See more in SMILE doc
X-Means Input: * data - sequence of samples * clusters - number of clusters * max-iter (optional) - maximum number of iterations * tolerance (optional) - tolerance of convergence test See more in [SMILE doc](https://haifengl.github.io/api/java/smile/clustering/XMeans.html)
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close