The impact of K selection in K‑fold cross-validation on bias and variance in supervised learning models
The University of Sydney · Chengdu Institute of Information Technology (China)
Abstract
K-fold cross-validation is a widely used technique for estimating the generalisation of the performance of supervised machine learning models. However, the effect of the number of folds (k) on bias-variance behaviour across models and datasets is not fully understood. This study examines how varying k, from 3 to 20, relates to estimates of bias and variance across four classification algorithms, evaluated on twelve datasets of varying sizes. These four algorithms are Support Vector Machine (SVM), Decision Tree (DT), Logistic Regression (LR), and k-Nearest Neighbours (KNN). We operationalise bias as the difference between the mean cross-validated training accuracy and the held-out test accuracy, and variance as…
Citation impact
- FWCI
- 116.53
- Percentile
- 100%
- References
- 36
Authors
3Topics & keywords
- Variance (accounting)
- Support vector machine
- Preprocessor
- Random forest
- Replication (statistics)
- Logistic regression
- Selection (genetic algorithm)
- Feature selection