Transformer-XL: Attentive Language Models beyond a Fixed-Length Context

Dai, Zihang; Yang, Zhilin; Yang, Yiming; Carbonell, Jaime; Le, Quoc V.; Salakhutdinov, Ruslan

doi:10.18653/v1/p19-1285

preprintJan 1, 2019GOLD OA

Transformer-XL: Attentive Language Models beyond a Fixed-Length Context

ZDZihang Dai ZYZhilin Yang YYYiming Yang JCJaime Carbonell QVQuoc V. Le

Carnegie Mellon University · Google (United States)

Indexed incrossref

Abstract

Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. We propose a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence. It consists of a segment-level recurrence mechanism and a novel positional encoding scheme. Our method not only enables capturing longer-term dependency, but also resolves the context fragmentation problem. As a result, Transformer-XL learns dependency that is 80% longer than RNNs and 450% longer than vanilla Transformers, achieves better performance on both short and long sequences, and is up to 1,800+ times faster than…

Citation impact

3,138

total citations

FWCI: 253.60
Percentile: 100%
References: 80

Citations per year

Authors

6

Topics & keywords

Topics

Keywords

Perplexity
Computer science
Language model
Transformer
Treebank
Artificial intelligence
Hyperparameter
Natural language processing

UN Sustainable Development Goals

Quality Education

No related works found for this paper.