A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification

Statnikov, Alexander; Wang, Lily; Aliferis, Constantin

doi:10.1186/1471-2105-9-319

articleBMC BioinformaticsJul 22, 2008GOLD OA

A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification

ASAlexander Statnikov LWLily Wang CAConstantin Aliferis

Vanderbilt University

PubMed

Indexed incrossrefdoajpubmed

Abstract

Background

Cancer diagnosis and clinical outcome prediction are among the most important emerging applications of gene expression microarray technology with several molecular signatures on their way toward clinical deployment. Use of the most accurate classification algorithms available for microarray gene expression data is a critical ingredient in order to develop the best possible molecular signatures for patient care. As suggested by a large body of literature to date, support vector machines can be considered "best of class" algorithms for classification of such data. Recent work, however, suggests that random forest classifiers may outperform support vector machines in this domain.

Results

In the present paper we identify methodological biases of prior work comparing random forests and support vector machines and conduct a new rigorous evaluation of the two algorithms that corrects these limitations. Our experiments use 22 diagnostic and prognostic datasets and show that support vector machines outperform random forests, often by a large margin. Our data also underlines the importance of sound research design in benchmarking and comparison of bioinformatics algorithms.

Citation impact

688

total citations

FWCI: 11.79
Percentile: 100%
References: 36

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Random forest
Support vector machine
Computer science
Benchmarking
Machine learning
Data mining
Artificial intelligence
DNA microarray

UN Sustainable Development Goals

Life in Land

No related works found for this paper.