An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data
University of Hong Kong · Hong Kong Baptist University
Abstract
This paper presents a new k-means type algorithm for clustering high-dimensional objects in sub-spaces. In high-dimensional data, clusters of objects often exist in subspaces rather than in the entire space. For example, in text clustering, clusters of documents of different topics are categorized by different subsets of terms or keywords. The keywords for one cluster may not occur in the documents of other clusters. This is a data sparsity problem faced in clustering high-dimensional data. In the new algorithm, we extend the k-means clustering process to calculate a weight for each dimension in each cluster and use the weight values to identify the subsets of important dimensions that categorize different…
Citation impact
- FWCI
- 37.19
- Percentile
- 100%
- References
- 47
Authors
3Topics & keywords
- Cluster analysis
- Clustering high-dimensional data
- Computer science
- CURE data clustering algorithm
- Correlation clustering
- Single-linkage clustering
- Canopy clustering algorithm
- Fuzzy clustering