Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics

Zimmer Biomet (Germany) · Ludwig-Maximilians-Universität München · +1 more institution

Indexed incrossrefdatacite

Abstract

Abstract The random forest (RF) algorithm by Leo Breiman has become a standard data analysis tool in bioinformatics. It has shown excellent performance in settings where the number of variables is much larger than the number of observations, can cope with complex interaction structures as well as highly correlated variables and return measures of variable importance. This paper synthesizes 10 years of RF development with emphasis on applications to bioinformatics and computational biology. Special attention is paid to practical aspects such as the selection of parameters, available RF implementations, and important pitfalls and biases of RF and its variable importance measures (VIMs). The paper surveys recent…

Citation impact

921
total citations
FWCI
8.60
Percentile
100%
References
85
Citations per year

Authors

4

Topics & keywords

Keywords
  • Implementation
  • Computer science
  • Random forest
  • Context (archaeology)
  • Variable (mathematics)
  • Emphasis (telecommunications)
  • Data science
  • Selection (genetic algorithm)
UN Sustainable Development Goals
  • Life in Land
No related works found for this paper.