Scalable k-means++
Stanford University · University of Illinois Urbana-Champaign · +5 more institutions
Abstract
Over half a century old and showing no signs of aging, k -means remains one of the most popular data processing algorithms. As is well-known, a proper initialization of k -means is crucial for obtaining a good final solution. The recently proposed k -means++ initialization algorithm achieves this, obtaining an initial set of centers that is provably close to the optimum solution. A major downside of the k -means++ is its inherent sequential nature, which limits its applicability to massive data: one must make k passes over the data to find a good initial set of centers. In this work we show how to drastically reduce the number of passes needed to obtain, in parallel, a good initialization. This is unlike…
Citation impact
- FWCI
- 21.29
- Percentile
- 100%
- References
- 40
Authors
5- BBBahman BahmaniCorresponding
Stanford University
- BMBenjamin Moseley
University of Illinois Urbana-Champaign, University of Illinois System
- AVAndrea Vattani
University of California San Diego
- RKRavi Kumar
Yahoo (United Kingdom), Yahoo (United States)
- SVSergei Vassilvitskii
Research!America (United States), Yahoo (United States)
Topics & keywords
- Initialization
- Computer science
- Scalability
- Set (abstract data type)
- Logarithm
- Constant (computer programming)
- Algorithm
- Scale (ratio)