Liking cljdoc? Tell your friends :D

fastmath.clustering

Clustering algorithms.

Various clustering algrorithms backed by SMILE library.

Currently implemented: only partition clustering.

Input data

It's always sequence of n-sized samples as sequences.

For example, 2d samples [[1 2] [2 2] [3 3] ...]

For 1d data you can pass sequence of numbers of sequence of 1d seqs of numbers

[1 2 3]
;; or
[[1] [2] [3]]

Distances

Some of the methods use distance functions, currently supported are:

  • :euclidean
  • :manhattan
  • :chebyshev

Output

Every function returns record which contains:

  • :type - name of the method used
  • :data - input data
  • :clustering - sequence of cluster ids
  • :sizes - sizes of clusters
  • :clusters - number of clusters
  • :predict - predicting function (see below), qualify additional sample
  • :representatives - list of centroids or medoids if available
  • :info - additional statistics for your samples (like distortion)
  • :obj - SMILE object

Cluster id is a integer ranging from 0 to the number of clusters minus 1. Some methods mark outliers with outlier-id.

Record acts as function and can qualify additonal sample by calling :predict function, for example (data is sequence of 3d samples):

(let [cl (k-means data 10)] (cl [0 1 2]))

See k-means

Regrouping

Clustering record can be regroupped to the list of individual clusters. Call regroup and get list of maps with following structure:

  • :key - cluster id
  • :data - samples which belong to the cluster
  • :outliers? - does it contain outliers or not
  • :representative - centroid/medoid or average vector if the former is not available
  • :size - size of cluster
Clustering algorithms.

Various clustering algrorithms backed by SMILE library.

Currently implemented: only partition clustering.

### Input data

It's always sequence of n-sized samples as sequences.

For example, 2d samples `[[1 2] [2 2] [3 3] ...]`

For 1d data you can pass sequence of numbers of sequence of 1d seqs of numbers

```clojure
[1 2 3]
;; or
[[1] [2] [3]]
```

### Distances

Some of the methods use distance functions, currently supported are:

* `:euclidean`
* `:manhattan`
* `:chebyshev`

### Output

Every function returns record which contains:

* `:type` - name of the method used
* `:data` - input data
* `:clustering` - sequence of cluster ids
* `:sizes` - sizes of clusters
* `:clusters` - number of clusters
* `:predict` - predicting function (see below), qualify additional sample
* `:representatives` - list of centroids or medoids if available
* `:info` - additional statistics for your samples (like distortion)
* `:obj` - SMILE object

Cluster id is a integer ranging from 0 to the number of clusters minus 1. Some methods mark outliers with [[outlier-id]].

Record acts as function and can qualify additonal sample by calling `:predict` function, for example (`data` is sequence of 3d samples):

```clojure
(let [cl (k-means data 10)] (cl [0 1 2]))
```

See [[k-means]]

#### Regrouping

Clustering record can be regroupped to the list of individual clusters. Call [[regroup]] and get list of maps with following structure:

* `:key` - cluster id
* `:data` - samples which belong to the cluster
* `:outliers?` - does it contain outliers or not
* `:representative` - centroid/medoid or average vector if the former is not available
* `:size` - size of cluster
raw docstring

claransclj

(clarans data clusters)
(clarans data dist clusters)
(clarans data dist clusters max-neighbor)
(clarans data dist clusters max-neighbor num-local)

Clustering Large Applications based upon RANdomized Search algorithm.

Input:

  • data - sequence of samples
  • clusters - numbe of clusters

Optional:

  • dist - distance method, default :euclidean
  • max-neighbor - maximum number of neighbors checked during random search
  • num-local - the number of local minima to search for

See more in SMILE doc

Clustering Large Applications based upon RANdomized Search algorithm.

Input:

* data - sequence of samples
* clusters - numbe of clusters

Optional:

* dist - distance method, default `:euclidean`
* max-neighbor - maximum number of neighbors checked during random search
* num-local - the number of local minima to search for

See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/clustering/CLARANS.html)
sourceraw docstring

clustering-methods-listclj

List of clustering methods.

List of clustering methods.
sourceraw docstring

dbscanclj

(dbscan data min-pts radius)
(dbscan data dist min-pts radius)

Density-Based Spatial Clustering of Applications with Noise algorithm.

Input:

  • data - sequence of samples
  • dist (optional) - distance method, default :euclidean
  • min-pts - minimum number of neighbors
  • radius - the neighborhood radius

See more in SMILE doc

Density-Based Spatial Clustering of Applications with Noise algorithm.

Input:

* data - sequence of samples
* dist (optional) - distance method, default `:euclidean`
* min-pts - minimum number of neighbors
* radius - the neighborhood radius

See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/clustering/DBScan.html)
sourceraw docstring

denclueclj

(denclue data sigma m)

DENsity CLUstering algorithm.

Input:

  • data - sequence of samples
  • sigma - gaussian kernel parameter
  • m - number of selected samples, much slower than number of all samples

See more in SMILE doc

DENsity CLUstering algorithm.

Input:

* data - sequence of samples
* sigma - gaussian kernel parameter
* m - number of selected samples, much slower than number of all samples

See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/clustering/DENCLUE.html)
sourceraw docstring

deterministic-annealingclj

(deterministic-annealing data max-clusters)
(deterministic-annealing data max-clusters alpha)

Deterministic Annealing algorithm.

Input:

  • data - sequence of samples
  • max-clusters - number of clusters
  • alpha (optional) - temperature decreasing factor (valued from 0 to 1)

See more in SMILE doc

Deterministic Annealing algorithm.

Input:

* data - sequence of samples
* max-clusters - number of clusters
* alpha (optional) - temperature decreasing factor (valued from 0 to 1)

See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/clustering/DeterministicAnnealing.html)
sourceraw docstring

distances-listclj

List of distances used in some clustring methods.

List of distances used in some clustring methods.
sourceraw docstring

g-meansclj

(g-means data max-clusters)

G-Means algorithm.

Input:

  • data - sequence of samples
  • max-clusters - maximum number of clusters

See more in SMILE doc

G-Means algorithm.

Input:

* data - sequence of samples
* max-clusters - maximum number of clusters

See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/clustering/GMeans.html)
sourceraw docstring

k-meansclj

(k-means data clusters)
(k-means data clusters max-iter)
(k-means data clusters max-iter runs)

K-Means++ algorithm.

Input:

  • data - sequence of samples
  • clusters - number of clusters
  • max-iter (optional) - maximum number of iterations
  • runs (optional) - maximum number of runs

See more in SMILE doc

K-Means++ algorithm.

Input:

* data - sequence of samples
* clusters - number of clusters
* max-iter (optional) - maximum number of iterations
* runs (optional) - maximum number of runs

See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/clustering/KMeans.html)
sourceraw docstring

mecclj

(mec data max-clusters radius)
(mec data dist max-clusters radius)

Nonparametric Minimum Conditional Entropy Clustering algorithm.

Input:

  • data - sequence of samples
  • dist (optional) - distance method, default :euclidean
  • max-clusters - maximum number of clusters
  • radius - the neighborhood radius

See more in SMILE doc

Nonparametric Minimum Conditional Entropy Clustering algorithm.

Input:

* data - sequence of samples
* dist (optional) - distance method, default `:euclidean`
* max-clusters - maximum number of clusters
* radius - the neighborhood radius

See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/clustering/MEC.html)
sourceraw docstring

neural-gasclj

(neural-gas data clusters)
(neural-gas data clusters lambda-i lambda-f eps-i eps-f steps)

Neural Gas algorithm.

Input:

  • data - sequence of samples
  • clusters - number of clusters

Optional:

  • lambda-i - intial lambda value (soft learning radius/rate)
  • lambda-f - final lambda value
  • eps-i - initial epsilon value (learning rate)
  • eps-f - final epsilon value
  • steps - number of iterations

See more in SMILE doc

Neural Gas algorithm.

Input:

* data - sequence of samples
* clusters - number of clusters

Optional:

* lambda-i - intial lambda value (soft learning radius/rate)
* lambda-f - final lambda value
* eps-i - initial epsilon value (learning rate)
* eps-f - final epsilon value
* steps - number of iterations

See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/vq/NeuralGas.html)
sourceraw docstring

outlier-idclj

Id of the cluster which contain outliers.

Id of the cluster which contain outliers.
sourceraw docstring

regroupclj

(regroup clustered-data)

Transform clusterig result into list of clusters as separate maps.

Every map contain:

  • :key - cluster id
  • :data - samples which belong to the cluster
  • :outliers? - does it contain outliers or not
  • :representative - centroid/medoid or average vector if the former is not available
  • :size - size of cluster

Representative is always a n-dimensional sequence even if input is a list of numbers.

Empty clusters are skipped.

Transform clusterig result into list of clusters as separate maps.

Every map contain:

* `:key` - cluster id
* `:data` - samples which belong to the cluster
* `:outliers?` - does it contain outliers or not
* `:representative` - centroid/medoid or average vector if the former is not available
* `:size` - size of cluster

Representative is always a n-dimensional sequence even if input is a list of numbers.

Empty clusters are skipped.
sourceraw docstring

x-meansclj

(x-means data max-clusters)

X-Means algorithm.

Input:

  • data - sequence of samples
  • max-clusters - number of clusters

See more in SMILE doc

X-Means algorithm.

Input:

* data - sequence of samples
* max-clusters - number of clusters

See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/clustering/XMeans.html)
sourceraw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close