An efficient k-means clustering algorithm: analysis and implementation

Kanungo, Tapas; Mount, David M.; Netanyahu, Nathan S.; Piatko, Christine; Silverman, Ruth; Wu, Angela Y.

doi:10.1109/tpami.2002.1017616

articleIEEE Transactions on Pattern Analysis and Machine IntelligenceJul 1, 2002Closed access

An efficient k-means clustering algorithm: analysis and implementation

TKTapas Kanungo DMDavid M. Mount NSNathan S. Netanyahu CPChristine Piatko RSRuth Silverman

IBM (United States)

Indexed incrossref

Abstract

In k-means clustering, we are given a set of n data points in d-dimensional space R/sup d/ and an integer k and the problem is to determine a set of k points in Rd, called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's (1982) algorithm. We present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is easy to implement, requiring a kd-tree as the only major data structure. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time,…

Citation impact

5,631

total citations

FWCI: 25.75
Percentile: 100%
References: 69

Citations per year

Authors

6

Topics & keywords

Topics

Keywords

Cluster analysis
Computer science
Algorithm
Canopy clustering algorithm
Data stream clustering
CURE data clustering algorithm
Ramer–Douglas–Peucker algorithm
Data compression

No related works found for this paper.