Cluster Validation by Prediction Strength

Tibshirani, Robert; Walther, Guenther

doi:10.1198/106186005x59243

articleJournal of Computational and Graphical StatisticsAug 31, 2005Closed access

Cluster Validation by Prediction Strength

RTRobert Tibshirani GWGuenther Walther

Stanford Health Care · Stanford University

Indexed incrossref

Abstract

This article proposes a new quantity for assessing the number of groups or clusters in a dataset. The key idea is to view clustering as a supervised classification problem, in which we must also estimate the “true” class labels. The resulting “prediction strength” measure assesses how many groups can be predicted from the data, and how well. In the process, we develop novel notions of bias and variance for unlabeled data. Prediction strength performs well in simulation studies, and we apply it to clusters of breast cancer samples from a DNA microarray study. Finally, some consistency properties of the method are established.

Citation impact

648

total citations

FWCI: 31.23
Percentile: 100%
References: 15

Citations per year

Authors

2

Topics & keywords

Topics

Keywords

Consistency (knowledge bases)
Cluster analysis
Data mining
Computer science
Variance (accounting)
Measure (data warehouse)
Cluster (spacecraft)
Process (computing)

No related works found for this paper.