articlePattern RecognitionApr 15, 2019HYBRID OA

How much can k-means be improved by using better initialization and repeats?

University of Eastern Finland

Indexed incrossref

Abstract

In this paper, we study what are the most important factors that deteriorate the performance of the k-means algorithm, and how much this deterioration can be overcome either by using a better initialization technique, or by repeating (restarting) the algorithm. Our main finding is that when the clusters overlap, k-means can be significantly improved using these two tricks. Simple furthest point heuristic (Maxmin) reduces the number of erroneous clusters from 15% to 6%, on average, with our clustering benchmark. Repeating the algorithm 100 times reduces it further down to 1%. This accuracy is more than enough for most pattern recognition applications. However, when the data has well separated clusters, the…

Citation impact

446
total citations
FWCI
32.35
Percentile
100%
References
104
Citations per year

Authors

2

Topics & keywords

Keywords
  • Initialization
  • Cluster analysis
  • Computer science
  • Benchmark (surveying)
  • Heuristic
  • Algorithm
  • Point (geometry)
  • Pattern recognition (psychology)
No related works found for this paper.