Evaluation of variable selection methods for random forests and omics data sets

Degenhardt, Frauke; Seifert, Stephan; Szymczak, Silke

doi:10.1093/bib/bbx124

articleBriefings in BioinformaticsSep 19, 2017HYBRID OA

Evaluation of variable selection methods for random forests and omics data sets

FDFrauke Degenhardt SSStephan Seifert SSSilke Szymczak

Christian-Albrechts-Universität zu Kiel · Institute of Molecular Biology

PubMed

Indexed incrossrefdoajpubmed

Abstract

Machine learning methods and in particular random forests are promising approaches for prediction based on high dimensional omics data sets. They provide variable importance measures to rank predictors according to their predictive power. If building a prediction model is the main goal of a study, often a minimal set of variables with good prediction performance is selected. However, if the objective is the identification of involved variables to find active networks and pathways, approaches that aim to select all relevant variables should be preferred. We evaluated several variable selection procedures based on simulated data as well as publicly available experimental methylation and gene expression data. Our…

Citation impact

730

total citations

FWCI: 11.66
Percentile: 100%
References: 60

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Feature selection
Variable (mathematics)
Stability (learning theory)
Computer science
Data mining
Identification (biology)
Selection (genetic algorithm)
Permutation (music)

UN Sustainable Development Goals

Life in Land

No related works found for this paper.

Funding

BF
Bundesministerium für Bildung und Forschung
Award: 01ZX1510