A prediction-based resampling method for estimating the number of clusters in a dataset
University of California, Berkeley · Zero to Three · +2 more institutions
Abstract
Microarray technology is increasingly being applied in biological and medical research to address a wide range of problems, such as the classification of tumors. An important statistical problem associated with tumor classification is the identification of new tumor classes using gene-expression profiles. Two essential aspects of this clustering problem are: to estimate the number of clusters, if any, in a dataset; and to allocate tumor samples to these clusters, and assess the confidence of cluster assignments for individual samples. Here we address the first of these problems.
We have developed a new prediction-based resampling method, Clest, to estimate the number of clusters in a dataset. The performance of the new and existing methods were compared using simulated data and gene-expression data from four recently published cancer microarray studies. Clest was generally found to be more accurate and robust than the six existing methods considered in the study.
Citation impact
- FWCI
- 8.74
- Percentile
- 100%
- References
- 40
Authors
2Topics & keywords
- Resampling
- Cluster analysis
- Data mining
- Computer science
- DNA microarray
- Range (aeronautics)
- Identification (biology)
- Biology