An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data

Jing, Liping; Ng, Michael K.; Huang, Joshua Zhexue

doi:10.1109/tkde.2007.1048

articleIEEE Transactions on Knowledge and Data EngineeringJun 28, 2007Closed access

An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data

LJLiping Jing MKMichael K. Ng JZJoshua Zhexue Huang

University of Hong Kong · Hong Kong Baptist University

Indexed incrossref

Abstract

This paper presents a new k-means type algorithm for clustering high-dimensional objects in sub-spaces. In high-dimensional data, clusters of objects often exist in subspaces rather than in the entire space. For example, in text clustering, clusters of documents of different topics are categorized by different subsets of terms or keywords. The keywords for one cluster may not occur in the documents of other clusters. This is a data sparsity problem faced in clustering high-dimensional data. In the new algorithm, we extend the k-means clustering process to calculate a weight for each dimension in each cluster and use the weight values to identify the subsets of important dimensions that categorize different…

Citation impact

622

total citations

FWCI: 37.19
Percentile: 100%
References: 47

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Cluster analysis
Clustering high-dimensional data
Computer science
CURE data clustering algorithm
Correlation clustering
Single-linkage clustering
Canopy clustering algorithm
Fuzzy clustering

No related works found for this paper.