The impact of K selection in K‑fold cross-validation on bias and variance in supervised learning models

Abedin, Tahsinul; Xu, Haoming; Uddin, Shahadat

doi:10.1038/s41598-026-37247-x

articleScientific ReportsJan 23, 2026GOLD OA

The impact of K selection in K‑fold cross-validation on bias and variance in supervised learning models

TATahsinul Abedin HXHaoming Xu SUShahadat Uddin

The University of Sydney · Chengdu Institute of Information Technology (China)

PubMed

Indexed incrossrefdoajpubmed

Abstract

K-fold cross-validation is a widely used technique for estimating the generalisation of the performance of supervised machine learning models. However, the effect of the number of folds (k) on bias-variance behaviour across models and datasets is not fully understood. This study examines how varying k, from 3 to 20, relates to estimates of bias and variance across four classification algorithms, evaluated on twelve datasets of varying sizes. These four algorithms are Support Vector Machine (SVM), Decision Tree (DT), Logistic Regression (LR), and k-Nearest Neighbours (KNN). We operationalise bias as the difference between the mean cross-validated training accuracy and the held-out test accuracy, and variance as…

Citation impact

5

total citations

FWCI: 116.53
Percentile: 100%
References: 36

Too recent for citation history.

Authors

3

Topics & keywords

Topics

Keywords

Variance (accounting)
Support vector machine
Preprocessor
Random forest
Replication (statistics)
Logistic regression
Selection (genetic algorithm)
Feature selection

No related works found for this paper.