MASS: Masked Sequence to Sequence Pre-training for Language Generation

Song, Kaitao; Tan, Xu; Qin, Tao; Lu, Jianfeng; Liu, Tie‐Yan

doi:10.48550/arxiv.1905.02450

preprintarXiv (Cornell University)May 7, 2019GREEN OA

MASS: Masked Sequence to Sequence Pre-training for Language Generation

KSKaitao Song XTXu Tan TQTao Qin JLJianfeng Lu TLTie‐Yan Liu

Indexed inarxivdatacite

Abstract

Pre-training and fine-tuning, e.g., BERT, have achieved great success in language understanding by transferring knowledge from rich-resource pre-training task to the low/zero-resource downstream tasks. Inspired by the success of BERT, we propose MAsked Sequence to Sequence pre-training (MASS) for the encoder-decoder based language generation tasks. MASS adopts the encoder-decoder framework to reconstruct a sentence fragment given the remaining part of the sentence: its encoder takes a sentence with randomly masked fragment (several consecutive tokens) as input, and its decoder tries to predict this masked fragment. In this way, MASS can jointly train the encoder and decoder to develop the capability of…

Citation impact

580

total citations

FWCI: —
Percentile: —
References: 0

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Computer science
Machine translation
Automatic summarization
Encoder
Sentence
Language model
Natural language processing
Artificial intelligence

UN Sustainable Development Goals

Quality Education

No related works found for this paper.