How much can k-means be improved by using better initialization and repeats?
Indexed incrossref
Abstract
In this paper, we study what are the most important factors that deteriorate the performance of the k-means algorithm, and how much this deterioration can be overcome either by using a better initialization technique, or by repeating (restarting) the algorithm. Our main finding is that when the clusters overlap, k-means can be significantly improved using these two tricks. Simple furthest point heuristic (Maxmin) reduces the number of erroneous clusters from 15% to 6%, on average, with our clustering benchmark. Repeating the algorithm 100 times reduces it further down to 1%. This accuracy is more than enough for most pattern recognition applications. However, when the data has well separated clusters, the…
Citation impact
446
total citations
- FWCI
- 32.35
- Percentile
- 100%
- References
- 104
Citations per year
Authors
2Topics & keywords
Topics
Keywords
- Initialization
- Cluster analysis
- Computer science
- Benchmark (surveying)
- Heuristic
- Algorithm
- Point (geometry)
- Pattern recognition (psychology)
No related works found for this paper.