articleMar 1, 2010Closed access

On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation

Abstract

Model selection strategies for machine learning algorithms typically involve the numerical opti-misation of an appropriate model selection criterion, often based on an estimator of generalisation performance, such as k-fold cross-validation. The error of such an estimator can be broken down into bias and variance components. While unbiasedness is often cited as a beneficial quality of a model selection criterion, we demonstrate that a low variance is at least as important, as a non-negligible variance introduces the potential for over-fitting in model selection as well as in training the model. While this observation is in hindsight perhaps rather obvious, the degradation in perfor-mance due to over-fitting…

Citation impact

1,974
total citations
FWCI
20.52
Percentile
100%
References
54
Citations per year

Authors

2

Topics & keywords

Keywords
  • Selection (genetic algorithm)
  • Model selection
  • Hindsight bias
  • Variance (accounting)
  • Estimator
  • Computer science
  • Machine learning
  • Artificial intelligence
No related works found for this paper.