How much can k-means be improved by using better initialization and repeats?

Fränti, Pasi; Sieranoja, Sami

doi:10.1016/j.patcog.2019.04.014

articlePattern RecognitionApr 15, 2019HYBRID OA

How much can k-means be improved by using better initialization and repeats?

PFPasi Fränti SSSami Sieranoja

University of Eastern Finland

Indexed incrossref

Abstract

In this paper, we study what are the most important factors that deteriorate the performance of the k-means algorithm, and how much this deterioration can be overcome either by using a better initialization technique, or by repeating (restarting) the algorithm. Our main finding is that when the clusters overlap, k-means can be significantly improved using these two tricks. Simple furthest point heuristic (Maxmin) reduces the number of erroneous clusters from 15% to 6%, on average, with our clustering benchmark. Repeating the algorithm 100 times reduces it further down to 1%. This accuracy is more than enough for most pattern recognition applications. However, when the data has well separated clusters, the…

Citation impact

446

total citations

FWCI: 32.35
Percentile: 100%
References: 104

Citations per year

Authors

2

Topics & keywords

Topics

Keywords

Initialization
Cluster analysis
Computer science
Benchmark (surveying)
Heuristic
Algorithm
Point (geometry)
Pattern recognition (psychology)

No related works found for this paper.