Assessing Model Fit by Cross-Validation

Hawkins, Douglas M.; Basak, Subhash C.; Mills, Denise

doi:10.1021/ci025626i

articleJournal of Chemical Information and Computer SciencesJan 24, 2003Closed access

Assessing Model Fit by Cross-Validation

DMDouglas M. Hawkins SCSubhash C. Basak DMDenise Mills

University of Minnesota, Duluth · Minnesota Department of Natural Resources

PubMed

Indexed incrossrefpubmed

Abstract

When QSAR models are fitted, it is important to validate any fitted model-to check that it is plausible that its predictions will carry over to fresh data not used in the model fitting exercise. There are two standard ways of doing this-using a separate hold-out test sample and the computationally much more burdensome leave-one-out cross-validation in which the entire pool of available compounds is used both to fit the model and to assess its validity. We show by theoretical argument and empiric study of a large QSAR data set that when the available sample size is small-in the dozens or scores rather than the hundreds, holding a portion of it back for testing is wasteful, and that it is much better to use…

Citation impact

758

total citations

FWCI: 23.29
Percentile: 100%
References: 21

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Cross-validation
Computer science
Model validation
Set (abstract data type)
Sample (material)
Quantitative structure–activity relationship
Data mining
Test (biology)

No related works found for this paper.

Funding

UA
U.S. Air Force