ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Clark, Kevin B.; Luong, Minh-Thang; Le, Quoc V.; Manning, Christopher D.

doi:10.48550/arxiv.2003.10555

articlearXiv (Cornell University)Mar 23, 2020GREEN OA

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

KBKevin B. Clark MLMinh-Thang Luong QVQuoc V. Le CDChristopher D. Manning

Stanford University · Google (United States)

Indexed inarxivdatacite

Abstract

Masked language modeling (MLM) pre-training methods such as BERT corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens. While they produce good results when transferred to downstream NLP tasks, they generally require large amounts of compute to be effective. As an alternative, we propose a more sample-efficient pre-training task called replaced token detection. Instead of masking the input, our approach corrupts it by replacing some tokens with plausible alternatives sampled from a small generator network. Then, instead of training a model that predicts the original identities of the corrupted tokens, we train a discriminative model that predicts…

Citation impact

541

total citations

FWCI: —
Percentile: —
References: 48

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Computer science
Security token
Language model
Transformer
Discriminative model
Generator (circuit theory)
Artificial intelligence
Encoder

UN Sustainable Development Goals

Reduced inequalities

No related works found for this paper.