preprintarXiv (Cornell University)May 7, 2019GREEN OA

MASS: Masked Sequence to Sequence Pre-training for Language Generation

Indexed inarxivdatacite

Abstract

Pre-training and fine-tuning, e.g., BERT, have achieved great success in language understanding by transferring knowledge from rich-resource pre-training task to the low/zero-resource downstream tasks. Inspired by the success of BERT, we propose MAsked Sequence to Sequence pre-training (MASS) for the encoder-decoder based language generation tasks. MASS adopts the encoder-decoder framework to reconstruct a sentence fragment given the remaining part of the sentence: its encoder takes a sentence with randomly masked fragment (several consecutive tokens) as input, and its decoder tries to predict this masked fragment. In this way, MASS can jointly train the encoder and decoder to develop the capability of…

Citation impact

580
total citations
FWCI
Percentile
References
0
Citations per year

Authors

5

Topics & keywords

Keywords
  • Computer science
  • Machine translation
  • Automatic summarization
  • Encoder
  • Sentence
  • Language model
  • Natural language processing
  • Artificial intelligence
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.