An efficient k-means clustering algorithm: analysis and implementation

IBM (United States)

Indexed incrossref

Abstract

In k-means clustering, we are given a set of n data points in d-dimensional space R/sup d/ and an integer k and the problem is to determine a set of k points in Rd, called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's (1982) algorithm. We present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is easy to implement, requiring a kd-tree as the only major data structure. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time,…

Citation impact

5,631
total citations
FWCI
25.75
Percentile
100%
References
69
Citations per year

Authors

6

Topics & keywords

Keywords
  • Cluster analysis
  • Computer science
  • Algorithm
  • Canopy clustering algorithm
  • Data stream clustering
  • CURE data clustering algorithm
  • Ramer–Douglas–Peucker algorithm
  • Data compression
No related works found for this paper.

Funding