Optimal ratio for data splitting

Joseph, V. Roshan

doi:10.1002/sam.11583

articleStatistical Analysis and Data Mining The ASA Data Science JournalApr 4, 2022HYBRID OA

Optimal ratio for data splitting

VRV. Roshan Joseph

Georgia Institute of Technology

Indexed inarxivcrossref

Abstract

Abstract It is common to split a dataset into training and testing sets before fitting a statistical or machine learning model. However, there is no clear guidance on how much data should be used for training and testing. In this article, we show that the optimal training/testing splitting ratio is , where is the number of parameters in a linear regression model that explains the data well.

Citation impact

742

total citations

FWCI: 91.98
Percentile: 100%
References: 31

Citations per year

Authors

1

VR
V. Roshan JosephCorresponding
Georgia Institute of Technology

Topics & keywords

Topics

Keywords

Computer science
Training set
Linear regression
Machine learning
Data mining
Statistical hypothesis testing
Artificial intelligence
Statistics

No related works found for this paper.