Evaluation of variable selection methods for random forests and omics data sets
Christian-Albrechts-Universität zu Kiel · Institute of Molecular Biology
Abstract
Machine learning methods and in particular random forests are promising approaches for prediction based on high dimensional omics data sets. They provide variable importance measures to rank predictors according to their predictive power. If building a prediction model is the main goal of a study, often a minimal set of variables with good prediction performance is selected. However, if the objective is the identification of involved variables to find active networks and pathways, approaches that aim to select all relevant variables should be preferred. We evaluated several variable selection procedures based on simulated data as well as publicly available experimental methylation and gene expression data. Our…
Citation impact
- FWCI
- 11.66
- Percentile
- 100%
- References
- 60
Authors
3Topics & keywords
- Feature selection
- Variable (mathematics)
- Stability (learning theory)
- Computer science
- Data mining
- Identification (biology)
- Selection (genetic algorithm)
- Permutation (music)
- Life in Land