Clustering algorithms.
Various clustering algrorithms backed by SMILE library.
Currently implemented: only partition clustering.
It's always sequence of n-sized samples as sequences.
For example, 2d samples [[1 2] [2 2] [3 3] ...]
For 1d data you can pass sequence of numbers of sequence of 1d seqs of numbers
[1 2 3]
;; or
[[1] [2] [3]]
Some of the methods use distance functions, use fastmath.distance
namespace to create one.
Every function returns record which contains:
:type
- name of the method used:data
- input data:clustering
- sequence of cluster ids:sizes
- sizes of clusters:clusters
- number of clusters:predict
- predicting function (see below), qualify additional sample:representatives
- list of centroids or medoids if available:info
- additional statistics for your samples (like distortion):obj
- SMILE objectCluster id is a integer ranging from 0 to the number of clusters minus 1. Some methods mark outliers with outlier-id
.
Record acts as function and can qualify additonal sample by calling :predict
function (or just call predict
), for example (data
is sequence of 3d samples):
(let [cl (k-means data 10)] (cl [0 1 2]))
See k-means
Clustering record can be regroupped to the list of individual clusters. Call regroup
and get list of maps with following structure:
:key
- cluster id:data
- samples which belong to the cluster:outliers?
- does it contain outliers or not:representative
- centroid/medoid or average vector if the former is not available:size
- size of clusterClustering algorithms. Various clustering algrorithms backed by SMILE library. Currently implemented: only partition clustering. ### Input data It's always sequence of n-sized samples as sequences. For example, 2d samples `[[1 2] [2 2] [3 3] ...]` For 1d data you can pass sequence of numbers of sequence of 1d seqs of numbers ```clojure [1 2 3] ;; or [[1] [2] [3]] ``` ### Distances Some of the methods use distance functions, use [[fastmath.distance]] namespace to create one. ### Output Every function returns record which contains: * `:type` - name of the method used * `:data` - input data * `:clustering` - sequence of cluster ids * `:sizes` - sizes of clusters * `:clusters` - number of clusters * `:predict` - predicting function (see below), qualify additional sample * `:representatives` - list of centroids or medoids if available * `:info` - additional statistics for your samples (like distortion) * `:obj` - SMILE object Cluster id is a integer ranging from 0 to the number of clusters minus 1. Some methods mark outliers with [[outlier-id]]. Record acts as function and can qualify additonal sample by calling `:predict` function (or just call [[predict]]), for example (`data` is sequence of 3d samples): ```clojure (let [cl (k-means data 10)] (cl [0 1 2])) ``` See [[k-means]] #### Regrouping Clustering record can be regroupped to the list of individual clusters. Call [[regroup]] and get list of maps with following structure: * `:key` - cluster id * `:data` - samples which belong to the cluster * `:outliers?` - does it contain outliers or not * `:representative` - centroid/medoid or average vector if the former is not available * `:size` - size of cluster
(clarans data clusters)
(clarans data dist clusters)
(clarans data dist clusters max-neighbor)
(clarans data dist clusters max-neighbor num-local)
Clustering Large Applications based upon RANdomized Search algorithm.
Input:
Optional:
:euclidean
See more in SMILE doc
Clustering Large Applications based upon RANdomized Search algorithm. Input: * data - sequence of samples * clusters - numbe of clusters Optional: * dist - distance method, default `:euclidean` * max-neighbor - maximum number of neighbors checked during random search * num-local - the number of local minima to search for See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/clustering/CLARANS.html)
List of clustering methods.
List of clustering methods.
(dbscan data min-pts radius)
(dbscan data dist min-pts radius)
Density-Based Spatial Clustering of Applications with Noise algorithm.
Input:
:euclidean
See more in SMILE doc
Density-Based Spatial Clustering of Applications with Noise algorithm. Input: * data - sequence of samples * dist (optional) - distance method, default `:euclidean` * min-pts - minimum number of neighbors * radius - the neighborhood radius See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/clustering/DBSCAN.html)
(denclue data sigma m)
DENsity CLUstering algorithm.
Input:
See more in SMILE doc
DENsity CLUstering algorithm. Input: * data - sequence of samples * sigma - gaussian kernel parameter * m - number of selected samples, much slower than number of all samples See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/clustering/DENCLUE.html)
(deterministic-annealing data max-clusters)
(deterministic-annealing data max-clusters alpha)
Deterministic Annealing algorithm.
Input:
See more in SMILE doc
Deterministic Annealing algorithm. Input: * data - sequence of samples * max-clusters - number of clusters * alpha (optional) - temperature decreasing factor (valued from 0 to 1) See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/clustering/DeterministicAnnealing.html)
(g-means data max-clusters)
G-Means algorithm.
Input:
See more in SMILE doc
G-Means algorithm. Input: * data - sequence of samples * max-clusters - maximum number of clusters See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/clustering/GMeans.html)
(k-means data clusters)
(k-means data clusters max-iter)
(k-means data clusters max-iter runs)
K-Means++ algorithm.
Input:
See more in SMILE doc
K-Means++ algorithm. Input: * data - sequence of samples * clusters - number of clusters * max-iter (optional) - maximum number of iterations * runs (optional) - maximum number of runs See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/clustering/KMeans.html)
(mec data max-clusters radius)
(mec data dist max-clusters radius)
Nonparametric Minimum Conditional Entropy Clustering algorithm.
Input:
:euclidean
See more in SMILE doc
Nonparametric Minimum Conditional Entropy Clustering algorithm. Input: * data - sequence of samples * dist (optional) - distance method, default `:euclidean` * max-clusters - maximum number of clusters * radius - the neighborhood radius See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/clustering/MEC.html)
(neural-gas data clusters)
(neural-gas data clusters lambda-i lambda-f eps-i eps-f steps)
Neural Gas algorithm.
Input:
Optional:
See more in SMILE doc
Neural Gas algorithm. Input: * data - sequence of samples * clusters - number of clusters Optional: * lambda-i - intial lambda value (soft learning radius/rate) * lambda-f - final lambda value * eps-i - initial epsilon value (learning rate) * eps-f - final epsilon value * steps - number of iterations See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/vq/NeuralGas.html)
Id of the cluster which contain outliers.
Id of the cluster which contain outliers.
(predict cluster in)
Predict cluster for given vector
Predict cluster for given vector
(regroup clustered-data)
Transform clustering result into list of clusters as separate maps.
Every map contain:
:key
- cluster id:data
- samples which belong to the cluster:outliers?
- does it contain outliers or not:representative
- centroid/medoid or average vector if the former is not available:size
- size of clusterRepresentative is always a n-dimensional sequence even if input is a list of numbers.
Empty clusters are skipped.
Transform clustering result into list of clusters as separate maps. Every map contain: * `:key` - cluster id * `:data` - samples which belong to the cluster * `:outliers?` - does it contain outliers or not * `:representative` - centroid/medoid or average vector if the former is not available * `:size` - size of cluster Representative is always a n-dimensional sequence even if input is a list of numbers. Empty clusters are skipped.
(x-means data max-clusters)
X-Means algorithm.
Input:
See more in SMILE doc
X-Means algorithm. Input: * data - sequence of samples * max-clusters - number of clusters See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/clustering/XMeans.html)
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close