Dice Loss for Data-imbalanced NLP Tasks
Zhejiang University · Shannon Applied Biotechnology Centre
Abstract
Many NLP tasks such as tagging and machine reading comprehension (MRC) are faced with the severe data imbalance issue: negative examples significantly outnumber positive ones, and the huge number of easy-negative examples overwhelms training. The most commonly used cross entropy criteria is actually accuracy-oriented, which creates a discrepancy between training and test. At training time, each training instance contributes equally to the objective function, while at test time F1 score concerns more about positive examples.
Citation impact
- FWCI
- 39.99
- Percentile
- 100%
- References
- 73
Authors
6- XLXiaoya LiCorresponding
Zhejiang University, Shannon Applied Biotechnology Centre
- XSXiaofei Sun
Zhejiang University
- YMYuxian Meng
Shannon Applied Biotechnology Centre, Zhejiang University
- JLJunjun Liang
Shannon Applied Biotechnology Centre, Zhejiang University
- FWFei Wu
Zhejiang University, Shannon Applied Biotechnology Centre
Topics & keywords
- Dice
- Computer science
- Artificial intelligence
- Natural language processing
- Machine learning
- Task (project management)
- Cross entropy
- Support vector machine
- Quality Education