VSURF: An R Package for Variable Selection Using Random Forests
Centre National de la Recherche Scientifique · Université Côte d'Azur
Abstract
This paper describes the R package VSURF. Based on random forests, and for both regression and classification problems, it returns two subsets of variables. The first is a subset of important variables including some redundancy which can be relevant for interpretation, and the second one is a smaller subset corresponding to a model trying to avoid redundancy focusing more closely on the prediction objective. The two-stage strategy is based on a preliminary ranking of the explanatory variables using the random forests permutation-based score of importance and proceeds using a stepwise forward strategy for variable introduction. The two proposals can be obtained automatically using data-driven default values,…
Citation impact
- FWCI
- 14.04
- Percentile
- 100%
- References
- 52
Authors
3Topics & keywords
- R package
- Selection (genetic algorithm)
- Random forest
- Variable (mathematics)
- Computer science
- Forestry
- Statistics
- Environmental science
- Life in Land