K -means clustering via principal component analysis
Lawrence Berkeley National Laboratory
Abstract
Principal component analysis (PCA) is a widely used statistical technique for unsupervised dimension reduction. K-means clustering is a commonly used data clustering for performing unsupervised learning tasks. Here we prove that principal components are the continuous solutions to the discrete cluster membership indicators for K-means clustering. New lower bounds for K-means objective function are derived, which is the total variance minus the eigenvalues of the data covariance matrix. These results indicate that unsupervised dimension reduction is closely related to unsupervised learning. Several implications are discussed. On dimension reduction, the result provides new insights to the observed effectiveness…
Citation impact
- FWCI
- 6.87
- Percentile
- 100%
- References
- 23
Authors
2Topics & keywords
- Principal component analysis
- Cluster analysis
- Dimensionality reduction
- Unsupervised learning
- Pattern recognition (psychology)
- Sparse PCA
- Singular value decomposition
- Artificial intelligence