A Framework for Feature Selection in Clustering

Stanford University

PubMed
Indexed incrossrefpubmed

Abstract

We consider the problem of clustering observations using a potentially large set of features. One might expect that the true underlying clusters present in the data differ only with respect to a small fraction of the features, and will be missed if one clusters the observations using the full set of features. We propose a novel framework for sparse clustering, in which one clusters the observations using an adaptively chosen subset of the features. The method uses a lasso-type penalty to select the features. We use this framework to develop simple methods for sparse K-means and sparse hierarchical clustering. A single criterion governs both the selection of the features and the resulting clusters. These…

Citation impact

708
total citations
FWCI
9.26
Percentile
100%
References
39
Citations per year

Authors

2

Topics & keywords

Keywords
  • Cluster analysis
  • Computer science
  • Feature selection
  • Hierarchical clustering
  • Set (abstract data type)
  • Single-linkage clustering
  • Data mining
  • Selection (genetic algorithm)
No related works found for this paper.