preprintJan 1, 2019GOLD OA

Transformer-XL: Attentive Language Models beyond a Fixed-Length Context

Carnegie Mellon University · Google (United States)

Indexed incrossref

Abstract

Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. We propose a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence. It consists of a segment-level recurrence mechanism and a novel positional encoding scheme. Our method not only enables capturing longer-term dependency, but also resolves the context fragmentation problem. As a result, Transformer-XL learns dependency that is 80% longer than RNNs and 450% longer than vanilla Transformers, achieves better performance on both short and long sequences, and is up to 1,800+ times faster than…

Citation impact

3,138
total citations
FWCI
253.60
Percentile
100%
References
80
Citations per year

Authors

6

Topics & keywords

Keywords
  • Perplexity
  • Computer science
  • Language model
  • Transformer
  • Treebank
  • Artificial intelligence
  • Hyperparameter
  • Natural language processing
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.

Funding