preprintarXiv (Cornell University)Apr 20, 2020GREEN OA

MPNet: Masked and Permuted Pre-training for Language Understanding

Nanjing University of Science and Technology · Microsoft Research (United Kingdom)

Indexed inarxivdatacite

Abstract

BERT adopts masked language modeling (MLM) for pre-training and is one of the most successful pre-training models. Since BERT neglects dependency among predicted tokens, XLNet introduces permuted language modeling (PLM) for pre-training to address this problem. However, XLNet does not leverage the full position information of a sentence and thus suffers from position discrepancy between pre-training and fine-tuning. In this paper, we propose MPNet, a novel pre-training method that inherits the advantages of BERT and XLNet and avoids their limitations. MPNet leverages the dependency among predicted tokens through permuted language modeling (vs. MLM in BERT), and takes auxiliary position information as input to…

Citation impact

506
total citations
FWCI
Percentile
References
26
Citations per year

Authors

5

Topics & keywords

Keywords
  • Computer science
  • Leverage (statistics)
  • Language model
  • Dependency (UML)
  • Sentence
  • Margin (machine learning)
  • Artificial intelligence
  • Natural language processing
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.