fastmath.clustering

Liking cljdoc? Tell your friends :D

Clojure only.

clarans
clustering-methods-list
dbscan
denclue
deterministic-annealing
g-means
k-means
lloyd
mec
outlier-id
predict
regroup
spectral
x-means

Clustering.

Various clustering algrorithms backed by SMILE library.

Only partition clustering is implemented.

Input data

It's always sequence of n-sized samples as sequences.

For example, 2d samples [[1 2] [2 2] [3 3] ...]

For 1d data you can pass sequence of numbers of sequence of 1d seqs of numbers

[1 2 3]
;; or
[[1] [2] [3]]

Distances

Some of the methods use distance functions, use fastmath.distance namespace to create one.

Output

Every function returns record which contains:

:type - name of the method used
:data - input data
:clustering - sequence of cluster ids
:sizes - sizes of clusters
:clusters - number of clusters
:predict - predicting function (see below), qualify additional sample
:representatives - list of centroids or averages
:info - additional statistics for your samples (like distortion)
:obj - SMILE object

Cluster id is a integer ranging from 0 to the number of clusters minus 1. Some methods mark outliers with outlier-id.

Record acts as function and can qualify additonal sample by calling :predict function (or just call predict), for example (data is sequence of 3d samples):

(let [cl (k-means data 10)] (cl [0 1 2]))

See k-means

Regrouping

Clustering record can be regroupped to the list of individual clusters. Call regroup and get list of maps with following structure:

:key - cluster id or :outliers
:data - samples which belong to the cluster
:representative - centroid or average vector if the former is not available
:size - size of cluster

Clustering.

Various clustering algrorithms backed by SMILE library.

Only partition clustering is implemented.

### Input data

It's always sequence of n-sized samples as sequences.

For example, 2d samples `[[1 2] [2 2] [3 3] ...]`

For 1d data you can pass sequence of numbers of sequence of 1d seqs of numbers

```clojure
[1 2 3]
;; or
[[1] [2] [3]]
```

### Distances

Some of the methods use distance functions, use [[fastmath.distance]] namespace to create one.

### Output

Every function returns record which contains:

* `:type` - name of the method used
* `:data` - input data
* `:clustering` - sequence of cluster ids
* `:sizes` - sizes of clusters
* `:clusters` - number of clusters
* `:predict` - predicting function (see below), qualify additional sample
* `:representatives` - list of centroids or averages
* `:info` - additional statistics for your samples (like distortion)
* `:obj` - SMILE object

Cluster id is a integer ranging from 0 to the number of clusters minus 1. Some methods mark outliers with [[outlier-id]].

Record acts as function and can qualify additonal sample by calling `:predict` function (or just call [[predict]]), for example (`data` is sequence of 3d samples):

```clojure
(let [cl (k-means data 10)] (cl [0 1 2]))
```

See [[k-means]]

#### Regrouping

Clustering record can be regroupped to the list of individual clusters. Call [[regroup]] and get list of maps with following structure:

* `:key` - cluster id or `:outliers`
* `:data` - samples which belong to the cluster
* `:representative` - centroid or average vector if the former is not available
* `:size` - size of cluster

raw docstring

clarans^clj

(clarans data clusters)

(clarans data dist clusters)

(clarans data dist clusters max-neighbor)

Clustering Large Applications based upon RANdomized Search algorithm.

Input:

data - sequence of samples
dist (optional) - distance method, default euclidean
clusters - number of clusters
max-neighbor (optional) - maximum number of neighbors checked during random search

See more in SMILE doc

Clustering Large Applications based upon RANdomized Search algorithm.

Input:

* data - sequence of samples
* dist (optional) - distance method, default `euclidean`
* clusters - number of clusters
* max-neighbor (optional) - maximum number of neighbors checked during random search

See more in [SMILE doc](https://haifengl.github.io/api/java/smile/clustering/CLARANS.html)

source raw docstring

clustering-methods-list^clj

List of clustering methods.

List of clustering methods.

source raw docstring

dbscan^clj

(dbscan data min-pts radius)

(dbscan data dist min-pts radius)

Density-Based Spatial Clustering of Applications with Noise algorithm.

Input:

data - sequence of samples
dist (optional) - distance method, default euclidean
min-pts - minimum number of neighbors
radius - the neighborhood radius

See more in SMILE doc

Density-Based Spatial Clustering of Applications with Noise algorithm.

Input:

* data - sequence of samples
* dist (optional) - distance method, default `euclidean`
* min-pts - minimum number of neighbors
* radius - the neighborhood radius

See more in [SMILE doc](https://haifengl.github.io/api/java/smile/clustering/DBSCAN.html)

source raw docstring

denclue^clj

(denclue data sigma m)

(denclue data sigma m tolerance)

(denclue data sigma m tolerance min-pts)

DENsity CLUstering algorithm.

Input:

data - sequence of samples
sigma - gaussian kernel parameter
m - number of selected samples, much smaller than number of all samples
tolerance (optional) - tolerance of hill-climbing procedure
min-pts (optional) - minimum number of neighbors for a core attractor

See more in SMILE doc

DENsity CLUstering algorithm.

Input:

* data - sequence of samples
* sigma - gaussian kernel parameter
* m - number of selected samples, much smaller than number of all samples
* tolerance (optional) - tolerance of hill-climbing procedure
* min-pts (optional) - minimum number of neighbors for a core attractor

See more in [SMILE doc](https://haifengl.github.io/api/java/smile/clustering/DENCLUE.html)

source raw docstring

deterministic-annealing^clj

(deterministic-annealing data max-clusters)

(deterministic-annealing data max-clusters alpha)

(deterministic-annealing data max-clusters alpha max-iter)

(deterministic-annealing data max-clusters alpha max-iter tolerance)

(deterministic-annealing data
                         max-clusters
                         alpha
                         max-iter
                         tolerance
                         split-tolerance)

Deterministic Annealing algorithm.

Input:

data - sequence of samples
max-clusters - number of clusters
alpha (optional) - temperature decreasing factor (valued from 0 to 1)
max-iter (optional) - maximum number of iterations
tolerance (optional) - tolerance of convergence test
split-tolerance (optional) - tolerance to split a cluster

See more in SMILE doc

Deterministic Annealing algorithm.

Input:

* data - sequence of samples
* max-clusters - number of clusters
* alpha (optional) - temperature decreasing factor (valued from 0 to 1)
* max-iter (optional) - maximum number of iterations
* tolerance (optional) - tolerance of convergence test
* split-tolerance (optional) - tolerance to split a cluster

See more in [SMILE doc](https://haifengl.github.io/api/java/smile/clustering/DeterministicAnnealing.html)

source raw docstring

g-means^clj

(g-means data clusters)

(g-means data clusters max-iter)

(g-means data clusters max-iter tolerance)

G-Means

Input:

data - sequence of samples
clusters - number of clusters
max-iter (optional) - maximum number of iterations
tolerance (optional) - tolerance of convergence test

See more in SMILE doc

G-Means

Input:

* data - sequence of samples
* clusters - number of clusters
* max-iter (optional) - maximum number of iterations
* tolerance (optional) - tolerance of convergence test

See more in [SMILE doc](https://haifengl.github.io/api/java/smile/clustering/GMeans.html)

source raw docstring

k-means^clj

(k-means data clusters)

(k-means data clusters max-iter)

(k-means data clusters max-iter tolerance)

K-Means++ algorithm.

Input:

data - sequence of samples
clusters - number of clusters
max-iter (optional) - maximum number of iterations
tolerance (optional) - tolerance of convergence test

See more in SMILE doc

K-Means++ algorithm.

Input:

* data - sequence of samples
* clusters - number of clusters
* max-iter (optional) - maximum number of iterations
* tolerance (optional) - tolerance of convergence test

See more in [SMILE doc](https://haifengl.github.io/api/java/smile/clustering/KMeans.html)

source raw docstring

lloyd^clj

(lloyd data clusters)

(lloyd data clusters max-iter)

(lloyd data clusters max-iter tolerance)

K-Means algorithm, lloyd.

Input:

data - sequence of samples
clusters - number of clusters
max-iter (optional) - maximum number of iterations
tolerance (optional) - tolerance of convergence test

See more in SMILE doc

K-Means algorithm, lloyd.

Input:

* data - sequence of samples
* clusters - number of clusters
* max-iter (optional) - maximum number of iterations
* tolerance (optional) - tolerance of convergence test

See more in [SMILE doc](https://haifengl.github.io/api/java/smile/clustering/KMeans.html)

source raw docstring

mec^clj

(mec data max-clusters radius)

(mec data dist max-clusters radius)

Nonparametric Minimum Conditional Entropy Clustering algorithm.

Input:

data - sequence of samples
dist (optional) - distance method, default :euclidean
max-clusters - maximum number of clusters
radius - the neighborhood radius

See more in SMILE doc

Nonparametric Minimum Conditional Entropy Clustering algorithm.

Input:

* data - sequence of samples
* dist (optional) - distance method, default `:euclidean`
* max-clusters - maximum number of clusters
* radius - the neighborhood radius

See more in [SMILE doc](https://haifengl.github.io/api/java/smile/clustering/MEC.html)

source raw docstring

outlier-id^clj

Id of the cluster which contain outliers.

Id of the cluster which contain outliers.

source raw docstring

predict^clj

(predict cluster in)

Predict cluster for given vector

Predict cluster for given vector

source raw docstring

regroup^clj

(regroup {:keys [clustering data representatives sizes]})

Transform clustering result into list of clusters as separate maps.

Every map contain:

:key - cluster id or :outliers
:data - samples which belong to the cluster
:representative - centroid/medoid or average vector if the former is not available
:size - size of cluster

Representative is always a n-dimensional sequence even if input is a list of numbers.

Empty clusters are skipped.

Transform clustering result into list of clusters as separate maps.

Every map contain:

* `:key` - cluster id or `:outliers`
* `:data` - samples which belong to the cluster
* `:representative` - centroid/medoid or average vector if the former is not available
* `:size` - size of cluster

Representative is always a n-dimensional sequence even if input is a list of numbers.

Empty clusters are skipped.

source raw docstring

spectral^clj

(spectral data clusters sigma)

(spectral data clusters samples sigma)

(spectral data clusters sigma max-iters tolerance)

(spectral data clusters samples sigma max-iters tolerance)

Spectral clustering

Input:

data - sequence of samples
clusters - number of clusters
samples (optional) - number of random samples for Nystrom approximation
sigma - width parameter for Gaussian kernel
max-iter (optional) - maximum number of iterations
tolerance (optional) - tolerance of k-means convergence test

See more in SMILE doc

Spectral clustering

Input:

* data - sequence of samples
* clusters - number of clusters
* samples (optional) - number of random samples for Nystrom approximation
* sigma - width parameter for Gaussian kernel
* max-iter (optional) - maximum number of iterations
* tolerance (optional) - tolerance of k-means convergence test

See more in [SMILE doc](https://haifengl.github.io/api/java/smile/clustering/SpectralClustering.html)

source raw docstring

x-means^clj

(x-means data clusters)

(x-means data clusters max-iter)

(x-means data clusters max-iter tolerance)

X-Means

Input:

data - sequence of samples
clusters - number of clusters
max-iter (optional) - maximum number of iterations
tolerance (optional) - tolerance of convergence test

See more in SMILE doc

X-Means

Input:

* data - sequence of samples
* clusters - number of clusters
* max-iter (optional) - maximum number of iterations
* tolerance (optional) - tolerance of convergence test

See more in [SMILE doc](https://haifengl.github.io/api/java/smile/clustering/XMeans.html)

source raw docstring

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts

`Ctrl`+`k`	Jump to recent docs
`←`	Move to previous article
`→`	Move to next article
`Ctrl`+`/`	Jump to the search field

Raise an issue Browse cljdoc source Chat on Slack

× close