Probabilistic Extension of Precision, Recall, and F1 Score for More Thorough Evaluation of Classification Models

Yacouby, Reda; Axman, Dustin

doi:10.18653/v1/2020.eval4nlp-1.9

articleJan 1, 2020GOLD OA

Probabilistic Extension of Precision, Recall, and F1 Score for More Thorough Evaluation of Classification Models

RYReda Yacouby DADustin Axman

Indexed incrossref

Abstract

In pursuit of the perfect supervised NLP classifier, razor thin margins and low-resource testsets can make modeling decisions difficult. Popular metrics such as Accuracy, Precision, and Recall are often insufficient as they fail to give a complete picture of the model's behavior. We present a probabilistic extension of Precision, Recall, and F1 score, which we refer to as confidence-Precision (cPrecision), confidence-Recall (cRecall), and confidence-F1 (cF1) respectively. The proposed metrics address some of the challenges faced when evaluating large-scale NLP systems, specifically when the model's confidence score assignments have an impact on the system's behavior. We describe four key benefits of our…

Citation impact

718

total citations

FWCI: 18.91
Percentile: 100%
References: 20

Citations per year

Authors

2

Topics & keywords

Topics

Keywords

Computer science
Probabilistic logic
Artificial intelligence
Precision and recall
Machine learning
Recall
Robustness (evolution)
Classifier (UML)

UN Sustainable Development Goals

Decent work and economic growth

No related works found for this paper.