Liking cljdoc? Tell your friends :D

fastmath.clustering

Clustering algorithms.

Various clustering algrorithms backed by SMILE library.

Currently implemented: only partition clustering.

Input data

It's always sequence of n-sized samples as sequences.

For example, 2d samples [[1 2] [2 2] [3 3] ...]

For 1d data you can pass sequence of numbers of sequence of 1d seqs of numbers

[1 2 3]
;; or
[[1] [2] [3]]

Distances

Some of the methods use distance functions, use fastmath.distance namespace to create one.

Output

Every function returns record which contains:

  • :type - name of the method used
  • :data - input data
  • :clustering - sequence of cluster ids
  • :sizes - sizes of clusters
  • :clusters - number of clusters
  • :predict - predicting function (see below), qualify additional sample
  • :representatives - list of centroids or medoids if available
  • :info - additional statistics for your samples (like distortion)
  • :obj - SMILE object

Cluster id is a integer ranging from 0 to the number of clusters minus 1. Some methods mark outliers with outlier-id.

Record acts as function and can qualify additonal sample by calling :predict function (or just call predict), for example (data is sequence of 3d samples):

(let [cl (k-means data 10)] (cl [0 1 2]))

See k-means

Regrouping

Clustering record can be regroupped to the list of individual clusters. Call regroup and get list of maps with following structure:

  • :key - cluster id
  • :data - samples which belong to the cluster
  • :outliers? - does it contain outliers or not
  • :representative - centroid/medoid or average vector if the former is not available
  • :size - size of cluster
Clustering algorithms.

Various clustering algrorithms backed by SMILE library.

Currently implemented: only partition clustering.

### Input data

It's always sequence of n-sized samples as sequences.

For example, 2d samples `[[1 2] [2 2] [3 3] ...]`

For 1d data you can pass sequence of numbers of sequence of 1d seqs of numbers

```clojure
[1 2 3]
;; or
[[1] [2] [3]]
```

### Distances

Some of the methods use distance functions, use [[fastmath.distance]] namespace to create one.

### Output

Every function returns record which contains:

* `:type` - name of the method used
* `:data` - input data
* `:clustering` - sequence of cluster ids
* `:sizes` - sizes of clusters
* `:clusters` - number of clusters
* `:predict` - predicting function (see below), qualify additional sample
* `:representatives` - list of centroids or medoids if available
* `:info` - additional statistics for your samples (like distortion)
* `:obj` - SMILE object

Cluster id is a integer ranging from 0 to the number of clusters minus 1. Some methods mark outliers with [[outlier-id]].

Record acts as function and can qualify additonal sample by calling `:predict` function (or just call [[predict]]), for example (`data` is sequence of 3d samples):

```clojure
(let [cl (k-means data 10)] (cl [0 1 2]))
```

See [[k-means]]

#### Regrouping

Clustering record can be regroupped to the list of individual clusters. Call [[regroup]] and get list of maps with following structure:

* `:key` - cluster id
* `:data` - samples which belong to the cluster
* `:outliers?` - does it contain outliers or not
* `:representative` - centroid/medoid or average vector if the former is not available
* `:size` - size of cluster
raw docstring

claransclj

(clarans data clusters)
(clarans data dist clusters)
(clarans data dist clusters max-neighbor)
(clarans data dist clusters max-neighbor num-local)

Clustering Large Applications based upon RANdomized Search algorithm.

Input:

  • data - sequence of samples
  • clusters - numbe of clusters

Optional:

  • dist - distance method, default :euclidean
  • max-neighbor - maximum number of neighbors checked during random search
  • num-local - the number of local minima to search for

See more in SMILE doc

Clustering Large Applications based upon RANdomized Search algorithm.

Input:

* data - sequence of samples
* clusters - numbe of clusters

Optional:

* dist - distance method, default `:euclidean`
* max-neighbor - maximum number of neighbors checked during random search
* num-local - the number of local minima to search for

See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/clustering/CLARANS.html)
sourceraw docstring

clustering-methods-listclj

List of clustering methods.

List of clustering methods.
sourceraw docstring

ClusteringResultclj

source

dbscanclj

(dbscan data min-pts radius)
(dbscan data dist min-pts radius)

Density-Based Spatial Clustering of Applications with Noise algorithm.

Input:

  • data - sequence of samples
  • dist (optional) - distance method, default :euclidean
  • min-pts - minimum number of neighbors
  • radius - the neighborhood radius

See more in SMILE doc

Density-Based Spatial Clustering of Applications with Noise algorithm.

Input:

* data - sequence of samples
* dist (optional) - distance method, default `:euclidean`
* min-pts - minimum number of neighbors
* radius - the neighborhood radius

See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/clustering/DBSCAN.html)
sourceraw docstring

denclueclj

(denclue data sigma m)

DENsity CLUstering algorithm.

Input:

  • data - sequence of samples
  • sigma - gaussian kernel parameter
  • m - number of selected samples, much slower than number of all samples

See more in SMILE doc

DENsity CLUstering algorithm.

Input:

* data - sequence of samples
* sigma - gaussian kernel parameter
* m - number of selected samples, much slower than number of all samples

See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/clustering/DENCLUE.html)
sourceraw docstring

deterministic-annealingclj

(deterministic-annealing data max-clusters)
(deterministic-annealing data max-clusters alpha)

Deterministic Annealing algorithm.

Input:

  • data - sequence of samples
  • max-clusters - number of clusters
  • alpha (optional) - temperature decreasing factor (valued from 0 to 1)

See more in SMILE doc

Deterministic Annealing algorithm.

Input:

* data - sequence of samples
* max-clusters - number of clusters
* alpha (optional) - temperature decreasing factor (valued from 0 to 1)

See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/clustering/DeterministicAnnealing.html)
sourceraw docstring

g-meansclj

(g-means data max-clusters)

G-Means algorithm.

Input:

  • data - sequence of samples
  • max-clusters - maximum number of clusters

See more in SMILE doc

G-Means algorithm.

Input:

* data - sequence of samples
* max-clusters - maximum number of clusters

See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/clustering/GMeans.html)
sourceraw docstring

k-meansclj

(k-means data clusters)
(k-means data clusters max-iter)
(k-means data clusters max-iter runs)

K-Means++ algorithm.

Input:

  • data - sequence of samples
  • clusters - number of clusters
  • max-iter (optional) - maximum number of iterations
  • runs (optional) - maximum number of runs

See more in SMILE doc

K-Means++ algorithm.

Input:

* data - sequence of samples
* clusters - number of clusters
* max-iter (optional) - maximum number of iterations
* runs (optional) - maximum number of runs

See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/clustering/KMeans.html)
sourceraw docstring

mecclj

(mec data max-clusters radius)
(mec data dist max-clusters radius)

Nonparametric Minimum Conditional Entropy Clustering algorithm.

Input:

  • data - sequence of samples
  • dist (optional) - distance method, default :euclidean
  • max-clusters - maximum number of clusters
  • radius - the neighborhood radius

See more in SMILE doc

Nonparametric Minimum Conditional Entropy Clustering algorithm.

Input:

* data - sequence of samples
* dist (optional) - distance method, default `:euclidean`
* max-clusters - maximum number of clusters
* radius - the neighborhood radius

See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/clustering/MEC.html)
sourceraw docstring

neural-gasclj

(neural-gas data clusters)
(neural-gas data clusters lambda-i lambda-f eps-i eps-f steps)

Neural Gas algorithm.

Input:

  • data - sequence of samples
  • clusters - number of clusters

Optional:

  • lambda-i - intial lambda value (soft learning radius/rate)
  • lambda-f - final lambda value
  • eps-i - initial epsilon value (learning rate)
  • eps-f - final epsilon value
  • steps - number of iterations

See more in SMILE doc

Neural Gas algorithm.

Input:

* data - sequence of samples
* clusters - number of clusters

Optional:

* lambda-i - intial lambda value (soft learning radius/rate)
* lambda-f - final lambda value
* eps-i - initial epsilon value (learning rate)
* eps-f - final epsilon value
* steps - number of iterations

See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/vq/NeuralGas.html)
sourceraw docstring

outlier-idclj

Id of the cluster which contain outliers.

Id of the cluster which contain outliers.
sourceraw docstring

predictclj

(predict cluster in)

Predict cluster for given vector

Predict cluster for given vector
sourceraw docstring

regroupclj

(regroup clustered-data)

Transform clustering result into list of clusters as separate maps.

Every map contain:

  • :key - cluster id
  • :data - samples which belong to the cluster
  • :outliers? - does it contain outliers or not
  • :representative - centroid/medoid or average vector if the former is not available
  • :size - size of cluster

Representative is always a n-dimensional sequence even if input is a list of numbers.

Empty clusters are skipped.

Transform clustering result into list of clusters as separate maps.

Every map contain:

* `:key` - cluster id
* `:data` - samples which belong to the cluster
* `:outliers?` - does it contain outliers or not
* `:representative` - centroid/medoid or average vector if the former is not available
* `:size` - size of cluster

Representative is always a n-dimensional sequence even if input is a list of numbers.

Empty clusters are skipped.
sourceraw docstring

x-meansclj

(x-means data max-clusters)

X-Means algorithm.

Input:

  • data - sequence of samples
  • max-clusters - number of clusters

See more in SMILE doc

X-Means algorithm.

Input:

* data - sequence of samples
* max-clusters - number of clusters

See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/clustering/XMeans.html)
sourceraw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close