An efficient k-means clustering algorithm: analysis and implementation
Indexed incrossref
Abstract
In k-means clustering, we are given a set of n data points in d-dimensional space R/sup d/ and an integer k and the problem is to determine a set of k points in Rd, called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's (1982) algorithm. We present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is easy to implement, requiring a kd-tree as the only major data structure. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time,…
Citation impact
5,631
total citations
- FWCI
- 25.75
- Percentile
- 100%
- References
- 69
Citations per year
Authors
6Topics & keywords
Topics
Keywords
- Cluster analysis
- Computer science
- Algorithm
- Canopy clustering algorithm
- Data stream clustering
- CURE data clustering algorithm
- Ramer–Douglas–Peucker algorithm
- Data compression
No related works found for this paper.