articleBriefings in BioinformaticsSep 19, 2017HYBRID OA

Evaluation of variable selection methods for random forests and omics data sets

Christian-Albrechts-Universität zu Kiel · Institute of Molecular Biology

PubMed
Indexed incrossrefdoajpubmed

Abstract

Machine learning methods and in particular random forests are promising approaches for prediction based on high dimensional omics data sets. They provide variable importance measures to rank predictors according to their predictive power. If building a prediction model is the main goal of a study, often a minimal set of variables with good prediction performance is selected. However, if the objective is the identification of involved variables to find active networks and pathways, approaches that aim to select all relevant variables should be preferred. We evaluated several variable selection procedures based on simulated data as well as publicly available experimental methylation and gene expression data. Our…

Citation impact

730
total citations
FWCI
11.66
Percentile
100%
References
60
Citations per year

Authors

3

Topics & keywords

Keywords
  • Feature selection
  • Variable (mathematics)
  • Stability (learning theory)
  • Computer science
  • Data mining
  • Identification (biology)
  • Selection (genetic algorithm)
  • Permutation (music)
UN Sustainable Development Goals
  • Life in Land
No related works found for this paper.

Funding