Fast and Provably Good Seedings for k-Means is a paper by Olivier Bachem, Mario Lucic, S. Hamed Hassani, and Andreas Krause which introduces an improvement to the monte carlo markov chain approximation of k-means++ D^2 sampling. It accomplishes this by computing the D^2 sampling distribution with respect to the first cluster. This has the practical benefit of removing some of the assumptions, like choice of distance metric, which were imposed in the former framing. As such the name of this algorithm is assumption free k-mc^2. A savvy reader may note that by computing the D^2 sampling distribution as part of the steps this algorithm loses some of the theoretical advantages of the pure markov chain formulation. The paper argues that this is acceptable, because in practice computing the first D^2 sampling distribution ends up paying for itself by reducing the chain length necessary to get convergence guarantees.
Fast and Provably Good Seedings for k-Means is a paper by Olivier Bachem, Mario Lucic, S. Hamed Hassani, and Andreas Krause which introduces an improvement to the monte carlo markov chain approximation of k-means++ D^2 sampling. It accomplishes this by computing the D^2 sampling distribution with respect to the first cluster. This has the practical benefit of removing some of the assumptions, like choice of distance metric, which were imposed in the former framing. As such the name of this algorithm is assumption free k-mc^2. A savvy reader may note that by computing the D^2 sampling distribution as part of the steps this algorithm loses some of the theoretical advantages of the pure markov chain formulation. The paper argues that this is acceptable, because in practice computing the first D^2 sampling distribution ends up paying for itself by reducing the chain length necessary to get convergence guarantees.
Approximating K-Means in sublinear time is a paper written by Olivier Bachmem, Mario Lucic, Hamed Hassani, and Andreas Krause which shares a method for obtaining a provably good approximation of k-means++ in sublinear time. The method they share uses markov chain monte carlo sampling in order to approximate the D^2 sampling that is used in k-means++. Since this method is proven to converge to drawing from the same distribution as D^2 sampling in k-means++ the theoretical competitiveness guarantees of k-means++ are inherited. This algorithm is sublinear with respect to input size which makes it different from other variants of k-means++ like k-means||. Whereas a variant like k-means|| allows for a distributed k-means++ computation to be carried out across a cluster of computers, k-means-mc++ is better suited to running on a single machine.
Approximating K-Means in sublinear time is a paper written by Olivier Bachmem, Mario Lucic, Hamed Hassani, and Andreas Krause which shares a method for obtaining a provably good approximation of k-means++ in sublinear time. The method they share uses markov chain monte carlo sampling in order to approximate the D^2 sampling that is used in k-means++. Since this method is proven to converge to drawing from the same distribution as D^2 sampling in k-means++ the theoretical competitiveness guarantees of k-means++ are inherited. This algorithm is sublinear with respect to input size which makes it different from other variants of k-means++ like k-means||. Whereas a variant like k-means|| allows for a distributed k-means++ computation to be carried out across a cluster of computers, k-means-mc++ is better suited to running on a single machine.
A random initialization strategy for k means which lacks theoretical guarantees on solution quality for any individual run, but which will complete in O(n + kd) time and only takes O(kd) space.
A random initialization strategy for k means which lacks theoretical guarantees on solution quality for any individual run, but which will complete in O(n + k*d) time and only takes O(k*d) space.
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close