articleProceedings of the VLDB EndowmentMar 1, 2012Closed access

Scalable k-means++

Stanford University · University of Illinois Urbana-Champaign · +5 more institutions

Indexed incrossref

Abstract

Over half a century old and showing no signs of aging, k -means remains one of the most popular data processing algorithms. As is well-known, a proper initialization of k -means is crucial for obtaining a good final solution. The recently proposed k -means++ initialization algorithm achieves this, obtaining an initial set of centers that is provably close to the optimum solution. A major downside of the k -means++ is its inherent sequential nature, which limits its applicability to massive data: one must make k passes over the data to find a good initial set of centers. In this work we show how to drastically reduce the number of passes needed to obtain, in parallel, a good initialization. This is unlike…

No related works found for this paper.