Dice Loss for Data-imbalanced NLP Tasks

Li, Xiaoya; Sun, Xiaofei; Meng, Yuxian; Liang, Junjun; Wu, Fei; Li, Jiwei

doi:10.18653/v1/2020.acl-main.45

articleJan 1, 2020GOLD OA

Dice Loss for Data-imbalanced NLP Tasks

XLXiaoya Li XSXiaofei Sun YMYuxian Meng JLJunjun Liang FWFei Wu

Zhejiang University · Shannon Applied Biotechnology Centre

Indexed incrossref

Abstract

Many NLP tasks such as tagging and machine reading comprehension (MRC) are faced with the severe data imbalance issue: negative examples significantly outnumber positive ones, and the huge number of easy-negative examples overwhelms training. The most commonly used cross entropy criteria is actually accuracy-oriented, which creates a discrepancy between training and test. At training time, each training instance contributes equally to the objective function, while at test time F1 score concerns more about positive examples.

Citation impact

578

total citations

FWCI: 39.99
Percentile: 100%
References: 73

Citations per year

Authors

6

Topics & keywords

Topics

Keywords

Dice
Computer science
Artificial intelligence
Natural language processing
Machine learning
Task (project management)
Cross entropy
Support vector machine

UN Sustainable Development Goals

Quality Education

No related works found for this paper.

Funding

NN
National Natural Science Foundation of China
Awards: 61751209, 61625107