MASS: Masked Sequence to Sequence Pre-training for Language Generation
Indexed inarxivdatacite
Abstract
Pre-training and fine-tuning, e.g., BERT, have achieved great success in language understanding by transferring knowledge from rich-resource pre-training task to the low/zero-resource downstream tasks. Inspired by the success of BERT, we propose MAsked Sequence to Sequence pre-training (MASS) for the encoder-decoder based language generation tasks. MASS adopts the encoder-decoder framework to reconstruct a sentence fragment given the remaining part of the sentence: its encoder takes a sentence with randomly masked fragment (several consecutive tokens) as input, and its decoder tries to predict this masked fragment. In this way, MASS can jointly train the encoder and decoder to develop the capability of…
Citation impact
580
total citations
- FWCI
- —
- Percentile
- —
- References
- 0
Citations per year
Authors
5Topics & keywords
Topics
Keywords
- Computer science
- Machine translation
- Automatic summarization
- Encoder
- Sentence
- Language model
- Natural language processing
- Artificial intelligence
UN Sustainable Development Goals
- Quality Education
No related works found for this paper.