The receiver operating characteristic curve accurately assesses imbalanced datasets
La Jolla Institute for Immunology · Fundação Oswaldo Cruz · +2 more institutions
Abstract
Many problems in biology require looking for a "needle in a haystack," corresponding to a binary classification where there are a few positives within a much larger set of negatives, which is referred to as a class imbalance. The receiver operating characteristic (ROC) curve and the associated area under the curve (AUC) have been reported as ill-suited to evaluate prediction performance on imbalanced problems where there is more interest in performance on the positive minority class, while the precision-recall (PR) curve is preferable. We show via simulation and a real case study that this is a misinterpretation of the difference between the ROC and PR spaces, showing that the ROC curve is robust to class…
Citation impact
- FWCI
- 49.67
- Percentile
- 100%
- References
- 48
Authors
6Topics & keywords
- Receiver operating characteristic
- Computer science
- Remote sensing
- Statistics
- Mathematics
- Geology