fastmath.clustering

Liking cljdoc? Tell your friends :D

Clojure only.

clarans
clustering-methods-list
dbscan
denclue
deterministic-annealing
g-means
k-means
mec
neural-gas
outlier-id
predict
regroup
x-means

Clustering algorithms.

Various clustering algrorithms backed by SMILE library.

Currently implemented: only partition clustering.

Input data

It's always sequence of n-sized samples as sequences.

For example, 2d samples [[1 2] [2 2] [3 3] ...]

For 1d data you can pass sequence of numbers of sequence of 1d seqs of numbers

[1 2 3]
;; or
[[1] [2] [3]]

Distances

Some of the methods use distance functions, use fastmath.distance namespace to create one.

Output

Every function returns record which contains:

:type - name of the method used
:data - input data
:clustering - sequence of cluster ids
:sizes - sizes of clusters
:clusters - number of clusters
:predict - predicting function (see below), qualify additional sample
:representatives - list of centroids or medoids if available
:info - additional statistics for your samples (like distortion)
:obj - SMILE object

Cluster id is a integer ranging from 0 to the number of clusters minus 1. Some methods mark outliers with outlier-id.

Record acts as function and can qualify additonal sample by calling :predict function (or just call predict), for example (data is sequence of 3d samples):

(let [cl (k-means data 10)] (cl [0 1 2]))

See k-means

Regrouping

Clustering record can be regroupped to the list of individual clusters. Call regroup and get list of maps with following structure:

:key - cluster id
:data - samples which belong to the cluster
:outliers? - does it contain outliers or not
:representative - centroid/medoid or average vector if the former is not available
:size - size of cluster

Clustering algorithms.

Various clustering algrorithms backed by SMILE library.

Currently implemented: only partition clustering.

### Input data

It's always sequence of n-sized samples as sequences.

For example, 2d samples `[[1 2] [2 2] [3 3] ...]`

For 1d data you can pass sequence of numbers of sequence of 1d seqs of numbers

```clojure
[1 2 3]
;; or
[[1] [2] [3]]
```

### Distances

Some of the methods use distance functions, use [[fastmath.distance]] namespace to create one.

### Output

Every function returns record which contains:

* `:type` - name of the method used
* `:data` - input data
* `:clustering` - sequence of cluster ids
* `:sizes` - sizes of clusters
* `:clusters` - number of clusters
* `:predict` - predicting function (see below), qualify additional sample
* `:representatives` - list of centroids or medoids if available
* `:info` - additional statistics for your samples (like distortion)
* `:obj` - SMILE object

Cluster id is a integer ranging from 0 to the number of clusters minus 1. Some methods mark outliers with [[outlier-id]].

Record acts as function and can qualify additonal sample by calling `:predict` function (or just call [[predict]]), for example (`data` is sequence of 3d samples):

```clojure
(let [cl (k-means data 10)] (cl [0 1 2]))
```

See [[k-means]]

#### Regrouping

Clustering record can be regroupped to the list of individual clusters. Call [[regroup]] and get list of maps with following structure:

* `:key` - cluster id
* `:data` - samples which belong to the cluster
* `:outliers?` - does it contain outliers or not
* `:representative` - centroid/medoid or average vector if the former is not available
* `:size` - size of cluster

raw docstring

clarans^clj

(clarans data clusters)

(clarans data dist clusters)

(clarans data dist clusters max-neighbor)

(clarans data dist clusters max-neighbor num-local)

Clustering Large Applications based upon RANdomized Search algorithm.

Input:

data - sequence of samples
clusters - numbe of clusters

Optional:

dist - distance method, default :euclidean
max-neighbor - maximum number of neighbors checked during random search
num-local - the number of local minima to search for

See more in SMILE doc

Clustering Large Applications based upon RANdomized Search algorithm.

Input:

* data - sequence of samples
* clusters - numbe of clusters

Optional:

* dist - distance method, default `:euclidean`
* max-neighbor - maximum number of neighbors checked during random search
* num-local - the number of local minima to search for

See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/clustering/CLARANS.html)

source raw docstring

clustering-methods-list^clj

List of clustering methods.

List of clustering methods.

source raw docstring

dbscan^clj

(dbscan data min-pts radius)

(dbscan data dist min-pts radius)

Density-Based Spatial Clustering of Applications with Noise algorithm.

Input:

data - sequence of samples
dist (optional) - distance method, default :euclidean
min-pts - minimum number of neighbors
radius - the neighborhood radius

See more in SMILE doc

Density-Based Spatial Clustering of Applications with Noise algorithm.

Input:

* data - sequence of samples
* dist (optional) - distance method, default `:euclidean`
* min-pts - minimum number of neighbors
* radius - the neighborhood radius

See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/clustering/DBSCAN.html)

source raw docstring

denclue^clj

(denclue data sigma m)

DENsity CLUstering algorithm.

Input:

data - sequence of samples
sigma - gaussian kernel parameter
m - number of selected samples, much slower than number of all samples

See more in SMILE doc

DENsity CLUstering algorithm.

Input:

* data - sequence of samples
* sigma - gaussian kernel parameter
* m - number of selected samples, much slower than number of all samples

See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/clustering/DENCLUE.html)

source raw docstring

deterministic-annealing^clj

(deterministic-annealing data max-clusters)

(deterministic-annealing data max-clusters alpha)

Deterministic Annealing algorithm.

Input:

data - sequence of samples
max-clusters - number of clusters
alpha (optional) - temperature decreasing factor (valued from 0 to 1)

See more in SMILE doc

Deterministic Annealing algorithm.

Input:

* data - sequence of samples
* max-clusters - number of clusters
* alpha (optional) - temperature decreasing factor (valued from 0 to 1)

See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/clustering/DeterministicAnnealing.html)

source raw docstring

g-means^clj

(g-means data max-clusters)

G-Means algorithm.

Input:

data - sequence of samples
max-clusters - maximum number of clusters

See more in SMILE doc

G-Means algorithm.

Input:

* data - sequence of samples
* max-clusters - maximum number of clusters

See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/clustering/GMeans.html)

source raw docstring

k-means^clj

(k-means data clusters)

(k-means data clusters max-iter)

(k-means data clusters max-iter runs)

K-Means++ algorithm.

Input:

data - sequence of samples
clusters - number of clusters
max-iter (optional) - maximum number of iterations
runs (optional) - maximum number of runs

See more in SMILE doc

K-Means++ algorithm.

Input:

* data - sequence of samples
* clusters - number of clusters
* max-iter (optional) - maximum number of iterations
* runs (optional) - maximum number of runs

See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/clustering/KMeans.html)

source raw docstring

mec^clj

(mec data max-clusters radius)

(mec data dist max-clusters radius)

Nonparametric Minimum Conditional Entropy Clustering algorithm.

Input:

data - sequence of samples
dist (optional) - distance method, default :euclidean
max-clusters - maximum number of clusters
radius - the neighborhood radius

See more in SMILE doc

Nonparametric Minimum Conditional Entropy Clustering algorithm.

Input:

* data - sequence of samples
* dist (optional) - distance method, default `:euclidean`
* max-clusters - maximum number of clusters
* radius - the neighborhood radius

See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/clustering/MEC.html)

source raw docstring

neural-gas^clj

(neural-gas data clusters)

(neural-gas data clusters lambda-i lambda-f eps-i eps-f steps)

Neural Gas algorithm.

Input:

data - sequence of samples
clusters - number of clusters

Optional:

lambda-i - intial lambda value (soft learning radius/rate)
lambda-f - final lambda value
eps-i - initial epsilon value (learning rate)
eps-f - final epsilon value
steps - number of iterations

See more in SMILE doc

Neural Gas algorithm.

Input:

* data - sequence of samples
* clusters - number of clusters

Optional:

* lambda-i - intial lambda value (soft learning radius/rate)
* lambda-f - final lambda value
* eps-i - initial epsilon value (learning rate)
* eps-f - final epsilon value
* steps - number of iterations

See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/vq/NeuralGas.html)

source raw docstring

outlier-id^clj

Id of the cluster which contain outliers.

Id of the cluster which contain outliers.

source raw docstring

predict^clj

(predict cluster in)

Predict cluster for given vector

Predict cluster for given vector

source raw docstring

regroup^clj

(regroup clustered-data)

Transform clustering result into list of clusters as separate maps.

Every map contain:

:key - cluster id
:data - samples which belong to the cluster
:outliers? - does it contain outliers or not
:representative - centroid/medoid or average vector if the former is not available
:size - size of cluster

Representative is always a n-dimensional sequence even if input is a list of numbers.

Empty clusters are skipped.

Transform clustering result into list of clusters as separate maps.

Every map contain:

* `:key` - cluster id
* `:data` - samples which belong to the cluster
* `:outliers?` - does it contain outliers or not
* `:representative` - centroid/medoid or average vector if the former is not available
* `:size` - size of cluster

Representative is always a n-dimensional sequence even if input is a list of numbers.

Empty clusters are skipped.

source raw docstring

x-means^clj

(x-means data max-clusters)

X-Means algorithm.

Input:

data - sequence of samples
max-clusters - number of clusters

See more in SMILE doc

X-Means algorithm.

Input:

* data - sequence of samples
* max-clusters - number of clusters

See more in [SMILE doc](https://haifengl.github.io/smile/api/java/smile/clustering/XMeans.html)

source raw docstring

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts

`Ctrl`+`k`	Jump to recent docs
`←`	Move to previous article
`→`	Move to next article
`Ctrl`+`/`	Jump to the search field

Raise an issue Browse cljdoc source Chat on Slack

× close