All notable changes to this project will be documented in this file. This change log follows the conventions of keepachangelog.com.
- Fixes a bug which could cause incorrect assignment of points to clusters.
- Expands unit test coverage.
- Updated documentation to reflect usage.
- Improved ease of use - defaults for chain length chosen automatically when not provided.
- Expanded test coverage with generative testing.
- Added support for saving and loading learned models.
- Assignments when calling k-means-seq no longer overwrite each other.
- Add .load-assignments and .classify to ClusterResult.
- Now support the mc^2 initialization method.
- Now support the afk mc^2 initialization method.
- Default calling convention now runs multiple instances of k means
clustering.
- k-means can now be called with an option to specify which distance
function to use.
- k-means now supports many different distance functions.
- Now supporting initialization via k-means++.
- Now supporting initialization via k-means||.
- Now supporting initialization via k-means||.
- Now supporting initialization via naive uniform sampling.
- k-means can now be called with options to determine the initialization method.
- Now supporting parquet format.
- Now supporting arrow format.
- Now supporting arrows format.
- k-means can now be called with options to determine preferred file format.
- Added logs to help make progress of computations more obvious.
- Added the
initialize-centroids
multimethod; choice of initialization method can now be controlled by callers via multimethods. - Now support calling k-means with lazyseqs which are ->dataset compatible.
- Fixed an issue with processing large datasets by moving from arrow files to arrows IPC format.
- Rebranded from clj-kmeans to josh.meanings, because I like the name meanings but
figure I should namespace it.
- Dramatically improved performance on large datasets accomplished by switching
from buffered csv reading to optimized tech.ml.dataset usage.
- Initial project created with support for k-means clustering on larger than memory datasets.
- Only configuration provided is choice of k. Distance function used is earth mover distance.