articleIEEE Transactions on Knowledge and Data EngineeringJun 28, 2007Closed access

An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data

University of Hong Kong · Hong Kong Baptist University

Indexed incrossref

Abstract

This paper presents a new k-means type algorithm for clustering high-dimensional objects in sub-spaces. In high-dimensional data, clusters of objects often exist in subspaces rather than in the entire space. For example, in text clustering, clusters of documents of different topics are categorized by different subsets of terms or keywords. The keywords for one cluster may not occur in the documents of other clusters. This is a data sparsity problem faced in clustering high-dimensional data. In the new algorithm, we extend the k-means clustering process to calculate a weight for each dimension in each cluster and use the weight values to identify the subsets of important dimensions that categorize different…

Citation impact

622
total citations
FWCI
37.19
Percentile
100%
References
47
Citations per year

Authors

3

Topics & keywords

Keywords
  • Cluster analysis
  • Clustering high-dimensional data
  • Computer science
  • CURE data clustering algorithm
  • Correlation clustering
  • Single-linkage clustering
  • Canopy clustering algorithm
  • Fuzzy clustering
No related works found for this paper.