On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation

Cawley, Gavin C.; Talbot, Nicola L. C.

doi:10.5555/1756006.1859921

articleMar 1, 2010Closed access

On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation

Abstract

Model selection strategies for machine learning algorithms typically involve the numerical opti-misation of an appropriate model selection criterion, often based on an estimator of generalisation performance, such as k-fold cross-validation. The error of such an estimator can be broken down into bias and variance components. While unbiasedness is often cited as a beneficial quality of a model selection criterion, we demonstrate that a low variance is at least as important, as a non-negligible variance introduces the potential for over-fitting in model selection as well as in training the model. While this observation is in hindsight perhaps rather obvious, the degradation in perfor-mance due to over-fitting…

Citation impact

1,974

total citations

FWCI: 20.52
Percentile: 100%
References: 54

Citations per year

Authors

2

Topics & keywords

Topics

Keywords

Selection (genetic algorithm)
Model selection
Hindsight bias
Variance (accounting)
Estimator
Computer science
Machine learning
Artificial intelligence

No related works found for this paper.