Data Quality: Some Comments on the NASA Software Defect Datasets
Brunel University of London · Xi'an Jiaotong University · +1 more institution
Abstract
Background--Self-evidently empirical analyses rely upon the quality of their data. Likewise, replications rely upon accurate reporting and using the same rather than similar versions of datasets. In recent years, there has been much interest in using machine learners to classify software modules into defect-prone and not defect-prone categories. The publicly available NASA datasets have been extensively used as part of this research. Objective--This short note investigates the extent to which published analyses based on the NASA defect datasets are meaningful and comparable. Method--We analyze the five studies published in the IEEE Transactions on Software Engineering since 2007 that have utilized these…
Citation impact
- FWCI
- 72.73
- Percentile
- 100%
- References
- 25
Authors
4- MSMartin ShepperdCorresponding
Brunel University of London
- QSQinbao Song
Xi'an Jiaotong University
- ZSZhongbin Sun
Xi'an Jiaotong University
- CMCarolyn Mair
University of the Arts London
Topics & keywords
- Computer science
- Preprocessor
- Replication (statistics)
- Software
- Data pre-processing
- Data mining
- Quality (philosophy)
- Software quality