Scalable k-means++

Bahmani, Bahman; Moseley, Benjamin; Vattani, Andrea; Kumar, Ravi; Vassilvitskii, Sergei

doi:10.14778/2180912.2180915

articleProceedings of the VLDB EndowmentMar 1, 2012Closed access

Scalable k-means++

BBBahman Bahmani BMBenjamin Moseley AVAndrea Vattani RKRavi Kumar SVSergei Vassilvitskii

Stanford University · University of Illinois Urbana-Champaign · +5 more institutions

Indexed incrossref

Abstract

Over half a century old and showing no signs of aging, k -means remains one of the most popular data processing algorithms. As is well-known, a proper initialization of k -means is crucial for obtaining a good final solution. The recently proposed k -means++ initialization algorithm achieves this, obtaining an initial set of centers that is provably close to the optimum solution. A major downside of the k -means++ is its inherent sequential nature, which limits its applicability to massive data: one must make k passes over the data to find a good initial set of centers. In this work we show how to drastically reduce the number of passes needed to obtain, in parallel, a good initialization. This is unlike…

Citation impact

633

total citations

FWCI: 21.29
Percentile: 100%
References: 40

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Initialization
Computer science
Scalability
Set (abstract data type)
Logarithm
Constant (computer programming)
Algorithm
Scale (ratio)

No related works found for this paper.