Parallel Spectral Clustering in Distributed Systems

Yahoo (United Kingdom) · Yahoo (United States) · +3 more institutions

PubMed
Indexed incrossrefpubmed

Abstract

Spectral clustering algorithms have been shown to be more effective in finding clusters than some traditional algorithms, such as k-means. However, spectral clustering suffers from a scalability problem in both memory use and computational time when the size of a data set is large. To perform clustering on large data sets, we investigate two representative ways of approximating the dense similarity matrix. We compare one approach by sparsifying the matrix with another by the Nyström method. We then pick the strategy of sparsifying the matrix via retaining nearest neighbors and investigate its parallelization. We parallelize both memory use and computation on distributed computers. Through an empirical study on…

Citation impact

621
total citations
FWCI
48.46
Percentile
100%
References
72
Citations per year

Authors

5

Topics & keywords

Keywords
  • Cluster analysis
  • Computer science
  • Scalability
  • Spectral clustering
  • Set (abstract data type)
  • Computation
  • Distributed memory
  • Similarity (geometry)
No related works found for this paper.

Funding