articleGenome biologyJun 25, 2002GOLD OA

A prediction-based resampling method for estimating the number of clusters in a dataset

University of California, Berkeley · Zero to Three · +2 more institutions

PubMed
Indexed incrossrefdoajpubmed

Abstract

Background

Microarray technology is increasingly being applied in biological and medical research to address a wide range of problems, such as the classification of tumors. An important statistical problem associated with tumor classification is the identification of new tumor classes using gene-expression profiles. Two essential aspects of this clustering problem are: to estimate the number of clusters, if any, in a dataset; and to allocate tumor samples to these clusters, and assess the confidence of cluster assignments for individual samples. Here we address the first of these problems.

Results

We have developed a new prediction-based resampling method, Clest, to estimate the number of clusters in a dataset. The performance of the new and existing methods were compared using simulated data and gene-expression data from four recently published cancer microarray studies. Clest was generally found to be more accurate and robust than the six existing methods considered in the study.

Citation impact

709
total citations
FWCI
8.74
Percentile
100%
References
40
Citations per year

Authors

2

Topics & keywords

Keywords
  • Resampling
  • Cluster analysis
  • Data mining
  • Computer science
  • DNA microarray
  • Range (aeronautics)
  • Identification (biology)
  • Biology
No related works found for this paper.