Using the triangle inequality to accelerate k-means

Elkan, Charles

articleAug 21, 2003Closed access

Using the triangle inequality to accelerate k-means

Abstract

The k-means algorithm is by far the most widely used method for discovering clusters in data. We show how to accelerate it dramatically, while still always computing exactly the same result as the standard algorithm. The accelerated algorithm avoids unnecessary distance calculations by applying the triangle inequality in two different ways, and by keeping track of lower and upper bounds for distances between points and centers. Experiments show that the new algorithm is effective for datasets with up to 1000 dimensions, and becomes more and more effective as the number k of clusters increases. For k&gt;=20 it is many times faster than the best previously known accelerated k-means method.

Citation impact

661

total citations

FWCI: 15.45
Percentile: 100%
References: 29

Citations per year

Authors

1

CE
Charles ElkanCorresponding
University of California, San Diego

Topics & keywords

Topics

Keywords

Triangle inequality
Computer science
Upper and lower bounds
Algorithm
Inequality
Running time
Mathematics
Combinatorics

No related works found for this paper.