articleMar 1, 2010Closed access
On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation
Abstract
Model selection strategies for machine learning algorithms typically involve the numerical opti-misation of an appropriate model selection criterion, often based on an estimator of generalisation performance, such as k-fold cross-validation. The error of such an estimator can be broken down into bias and variance components. While unbiasedness is often cited as a beneficial quality of a model selection criterion, we demonstrate that a low variance is at least as important, as a non-negligible variance introduces the potential for over-fitting in model selection as well as in training the model. While this observation is in hindsight perhaps rather obvious, the degradation in perfor-mance due to over-fitting…
Citation impact
1,974
total citations
- FWCI
- 20.52
- Percentile
- 100%
- References
- 54
Citations per year
Authors
2Topics & keywords
Topics
Keywords
- Selection (genetic algorithm)
- Model selection
- Hindsight bias
- Variance (accounting)
- Estimator
- Computer science
- Machine learning
- Artificial intelligence
No related works found for this paper.