Liking cljdoc? Tell your friends :D

fastmath.clustering

Clustering.

Various clustering algrorithms backed by SMILE library.

Only partition clustering is implemented.

Input data

It's always sequence of n-sized samples as sequences.

For example, 2d samples [[1 2] [2 2] [3 3] ...]

For 1d data you can pass sequence of numbers of sequence of 1d seqs of numbers

[1 2 3]
;; or
[[1] [2] [3]]

Distances

Some of the methods use distance functions, use fastmath.distance namespace to create one.

Output

Every function returns record which contains:

  • :type - name of the method used
  • :data - input data
  • :clustering - sequence of cluster ids
  • :sizes - sizes of clusters
  • :clusters - number of clusters
  • :predict - predicting function (see below), qualify additional sample
  • :representatives - list of centroids or averages
  • :info - additional statistics for your samples (like distortion)
  • :obj - SMILE object

Cluster id is a integer ranging from 0 to the number of clusters minus 1. Some methods mark outliers with outlier-id.

Record acts as function and can qualify additonal sample by calling :predict function (or just call predict), for example (data is sequence of 3d samples):

(let [cl (k-means data 10)] (cl [0 1 2]))

See k-means

Regrouping

Clustering record can be regroupped to the list of individual clusters. Call regroup and get list of maps with following structure:

  • :key - cluster id or :outliers
  • :data - samples which belong to the cluster
  • :representative - centroid or average vector if the former is not available
  • :size - size of cluster
Clustering.

Various clustering algrorithms backed by SMILE library.

Only partition clustering is implemented.

### Input data

It's always sequence of n-sized samples as sequences.

For example, 2d samples `[[1 2] [2 2] [3 3] ...]`

For 1d data you can pass sequence of numbers of sequence of 1d seqs of numbers

```clojure
[1 2 3]
;; or
[[1] [2] [3]]
```

### Distances

Some of the methods use distance functions, use [[fastmath.distance]] namespace to create one.

### Output

Every function returns record which contains:

* `:type` - name of the method used
* `:data` - input data
* `:clustering` - sequence of cluster ids
* `:sizes` - sizes of clusters
* `:clusters` - number of clusters
* `:predict` - predicting function (see below), qualify additional sample
* `:representatives` - list of centroids or averages
* `:info` - additional statistics for your samples (like distortion)
* `:obj` - SMILE object

Cluster id is a integer ranging from 0 to the number of clusters minus 1. Some methods mark outliers with [[outlier-id]].

Record acts as function and can qualify additonal sample by calling `:predict` function (or just call [[predict]]), for example (`data` is sequence of 3d samples):

```clojure
(let [cl (k-means data 10)] (cl [0 1 2]))
```

See [[k-means]]

#### Regrouping

Clustering record can be regroupped to the list of individual clusters. Call [[regroup]] and get list of maps with following structure:

* `:key` - cluster id or `:outliers`
* `:data` - samples which belong to the cluster
* `:representative` - centroid or average vector if the former is not available
* `:size` - size of cluster
raw docstring

claransclj

(clarans data clusters)
(clarans data dist clusters)
(clarans data dist clusters max-neighbor)

Clustering Large Applications based upon RANdomized Search algorithm.

Input:

  • data - sequence of samples
  • dist (optional) - distance method, default euclidean
  • clusters - number of clusters
  • max-neighbor (optional) - maximum number of neighbors checked during random search

See more in SMILE doc

Clustering Large Applications based upon RANdomized Search algorithm.

Input:

* data - sequence of samples
* dist (optional) - distance method, default `euclidean`
* clusters - number of clusters
* max-neighbor (optional) - maximum number of neighbors checked during random search

See more in [SMILE doc](https://haifengl.github.io/api/java/smile/clustering/CLARANS.html)
sourceraw docstring

clustering-methods-listclj

List of clustering methods.

List of clustering methods.
sourceraw docstring

dbscanclj

(dbscan data min-pts radius)
(dbscan data dist min-pts radius)

Density-Based Spatial Clustering of Applications with Noise algorithm.

Input:

  • data - sequence of samples
  • dist (optional) - distance method, default euclidean
  • min-pts - minimum number of neighbors
  • radius - the neighborhood radius

See more in SMILE doc

Density-Based Spatial Clustering of Applications with Noise algorithm.

Input:

* data - sequence of samples
* dist (optional) - distance method, default `euclidean`
* min-pts - minimum number of neighbors
* radius - the neighborhood radius

See more in [SMILE doc](https://haifengl.github.io/api/java/smile/clustering/DBSCAN.html)
sourceraw docstring

denclueclj

(denclue data sigma m)
(denclue data sigma m tolerance)
(denclue data sigma m tolerance min-pts)

DENsity CLUstering algorithm.

Input:

  • data - sequence of samples
  • sigma - gaussian kernel parameter
  • m - number of selected samples, much smaller than number of all samples
  • tolerance (optional) - tolerance of hill-climbing procedure
  • min-pts (optional) - minimum number of neighbors for a core attractor

See more in SMILE doc

DENsity CLUstering algorithm.

Input:

* data - sequence of samples
* sigma - gaussian kernel parameter
* m - number of selected samples, much smaller than number of all samples
* tolerance (optional) - tolerance of hill-climbing procedure
* min-pts (optional) - minimum number of neighbors for a core attractor

See more in [SMILE doc](https://haifengl.github.io/api/java/smile/clustering/DENCLUE.html)
sourceraw docstring

deterministic-annealingclj

(deterministic-annealing data max-clusters)
(deterministic-annealing data max-clusters alpha)
(deterministic-annealing data max-clusters alpha max-iter)
(deterministic-annealing data max-clusters alpha max-iter tolerance)
(deterministic-annealing data
                         max-clusters
                         alpha
                         max-iter
                         tolerance
                         split-tolerance)

Deterministic Annealing algorithm.

Input:

  • data - sequence of samples
  • max-clusters - number of clusters
  • alpha (optional) - temperature decreasing factor (valued from 0 to 1)
  • max-iter (optional) - maximum number of iterations
  • tolerance (optional) - tolerance of convergence test
  • split-tolerance (optional) - tolerance to split a cluster

See more in SMILE doc

Deterministic Annealing algorithm.

Input:

* data - sequence of samples
* max-clusters - number of clusters
* alpha (optional) - temperature decreasing factor (valued from 0 to 1)
* max-iter (optional) - maximum number of iterations
* tolerance (optional) - tolerance of convergence test
* split-tolerance (optional) - tolerance to split a cluster

See more in [SMILE doc](https://haifengl.github.io/api/java/smile/clustering/DeterministicAnnealing.html)
sourceraw docstring

g-meansclj

(g-means data clusters)
(g-means data clusters max-iter)
(g-means data clusters max-iter tolerance)

G-Means

Input:

  • data - sequence of samples
  • clusters - number of clusters
  • max-iter (optional) - maximum number of iterations
  • tolerance (optional) - tolerance of convergence test

See more in SMILE doc

G-Means

Input:

* data - sequence of samples
* clusters - number of clusters
* max-iter (optional) - maximum number of iterations
* tolerance (optional) - tolerance of convergence test

See more in [SMILE doc](https://haifengl.github.io/api/java/smile/clustering/GMeans.html)
sourceraw docstring

k-meansclj

(k-means data clusters)
(k-means data clusters max-iter)
(k-means data clusters max-iter tolerance)

K-Means++ algorithm.

Input:

  • data - sequence of samples
  • clusters - number of clusters
  • max-iter (optional) - maximum number of iterations
  • tolerance (optional) - tolerance of convergence test

See more in SMILE doc

K-Means++ algorithm.

Input:

* data - sequence of samples
* clusters - number of clusters
* max-iter (optional) - maximum number of iterations
* tolerance (optional) - tolerance of convergence test

See more in [SMILE doc](https://haifengl.github.io/api/java/smile/clustering/KMeans.html)
sourceraw docstring

lloydclj

(lloyd data clusters)
(lloyd data clusters max-iter)
(lloyd data clusters max-iter tolerance)

K-Means algorithm, lloyd.

Input:

  • data - sequence of samples
  • clusters - number of clusters
  • max-iter (optional) - maximum number of iterations
  • tolerance (optional) - tolerance of convergence test

See more in SMILE doc

K-Means algorithm, lloyd.

Input:

* data - sequence of samples
* clusters - number of clusters
* max-iter (optional) - maximum number of iterations
* tolerance (optional) - tolerance of convergence test

See more in [SMILE doc](https://haifengl.github.io/api/java/smile/clustering/KMeans.html)
sourceraw docstring

mecclj

(mec data max-clusters radius)
(mec data dist max-clusters radius)

Nonparametric Minimum Conditional Entropy Clustering algorithm.

Input:

  • data - sequence of samples
  • dist (optional) - distance method, default :euclidean
  • max-clusters - maximum number of clusters
  • radius - the neighborhood radius

See more in SMILE doc

Nonparametric Minimum Conditional Entropy Clustering algorithm.

Input:

* data - sequence of samples
* dist (optional) - distance method, default `:euclidean`
* max-clusters - maximum number of clusters
* radius - the neighborhood radius

See more in [SMILE doc](https://haifengl.github.io/api/java/smile/clustering/MEC.html)
sourceraw docstring

outlier-idclj

Id of the cluster which contain outliers.

Id of the cluster which contain outliers.
sourceraw docstring

predictclj

(predict cluster in)

Predict cluster for given vector

Predict cluster for given vector
sourceraw docstring

regroupclj

(regroup {:keys [clustering data representatives sizes]})

Transform clustering result into list of clusters as separate maps.

Every map contain:

  • :key - cluster id or :outliers
  • :data - samples which belong to the cluster
  • :representative - centroid/medoid or average vector if the former is not available
  • :size - size of cluster

Representative is always a n-dimensional sequence even if input is a list of numbers.

Empty clusters are skipped.

Transform clustering result into list of clusters as separate maps.

Every map contain:

* `:key` - cluster id or `:outliers`
* `:data` - samples which belong to the cluster
* `:representative` - centroid/medoid or average vector if the former is not available
* `:size` - size of cluster

Representative is always a n-dimensional sequence even if input is a list of numbers.

Empty clusters are skipped.
sourceraw docstring

spectralclj

(spectral data clusters sigma)
(spectral data clusters samples sigma)
(spectral data clusters sigma max-iters tolerance)
(spectral data clusters samples sigma max-iters tolerance)

Spectral clustering

Input:

  • data - sequence of samples
  • clusters - number of clusters
  • samples (optional) - number of random samples for Nystrom approximation
  • sigma - width parameter for Gaussian kernel
  • max-iter (optional) - maximum number of iterations
  • tolerance (optional) - tolerance of k-means convergence test

See more in SMILE doc

Spectral clustering

Input:

* data - sequence of samples
* clusters - number of clusters
* samples (optional) - number of random samples for Nystrom approximation
* sigma - width parameter for Gaussian kernel
* max-iter (optional) - maximum number of iterations
* tolerance (optional) - tolerance of k-means convergence test

See more in [SMILE doc](https://haifengl.github.io/api/java/smile/clustering/SpectralClustering.html)
sourceraw docstring

x-meansclj

(x-means data clusters)
(x-means data clusters max-iter)
(x-means data clusters max-iter tolerance)

X-Means

Input:

  • data - sequence of samples
  • clusters - number of clusters
  • max-iter (optional) - maximum number of iterations
  • tolerance (optional) - tolerance of convergence test

See more in SMILE doc

X-Means

Input:

* data - sequence of samples
* clusters - number of clusters
* max-iter (optional) - maximum number of iterations
* tolerance (optional) - tolerance of convergence test

See more in [SMILE doc](https://haifengl.github.io/api/java/smile/clustering/XMeans.html)
sourceraw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close