An Empirical Comparison of Model Validation Techniques for Defect Prediction Models

Tantithamthavorn, Chakkrit; McIntosh, Shane; Hassan, Ahmed E.; Matsumoto, Kenichi

doi:10.1109/tse.2016.2584050

articleIEEE Transactions on Software EngineeringJun 23, 2016Closed access

An Empirical Comparison of Model Validation Techniques for Defect Prediction Models

CTChakkrit Tantithamthavorn SMShane McIntosh AEAhmed E. Hassan KMKenichi Matsumoto

Nara Institute of Science and Technology · Queen's University

Indexed incrossref

Abstract

Defect prediction models help software quality assurance teams to allocate their limited resources to the most defect-prone modules. Model validation techniques, such as $k$ -fold cross-validation, use historical data to estimate how well a model will perform in the future. However, little is known about how accurate the estimates of model validation techniques tend to be. In this paper, we investigate the bias and variance of model validation techniques in the domain of defect prediction. Analysis of 101 public defect datasets suggests that 77 percent of them are highly susceptible to producing unstable results– - selecting an appropriate model validation technique is a critical experimental design choice.…

Citation impact

563

total citations

FWCI: 99.37
Percentile: 100%
References: 143

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Computer science
Variance (accounting)
Context (archaeology)
Cross-validation
Model validation
Sample (material)
Data mining
Predictive modelling

No related works found for this paper.