Cross‐validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure
University of Freiburg · Wright State University · +6 more institutions
Abstract
Ecological data often show temporal, spatial, hierarchical (random effects), or phylogenetic structure. Modern statistical approaches are increasingly accounting for such dependencies. However, when performing cross‐validation, these structures are regularly ignored, resulting in serious underestimation of predictive error. One cause for the poor performance of uncorrected (random) cross‐validation, noted often by modellers, are dependence structures in the data that persist as dependence structures in model residuals, violating the assumption of independence. Even more concerning, because often overlooked, is that structured data also provides ample opportunity for overfitting with non‐causal predictors. This…
Citation impact
- FWCI
- 82.06
- Percentile
- 100%
- References
- 106
Authors
14Topics & keywords
- Overfitting
- Random forest
- Computer science
- Cross-validation
- Econometrics
- Autoregressive model
- Contrast (vision)
- Extrapolation