preprintJan 1, 2019GOLD OA

Learning Deep Transformer Models for Machine Translation

Northeastern University · University of Macau · +1 more institution

Indexed incrossref

Abstract

Transformer is the state-of-the-art model in recent machine translation evaluations. Two strands of research are promising to improve models of this kind: the first uses wide networks (a.k.a. Transformer-Big) and has been the de facto standard for development of the Transformer system, and the other uses deeper language representation but faces the difficulty arising from learning deep networks. Here, we continue the line of research on the latter. We claim that a truly deep Transformer model can surpass the Transformer-Big counterpart by 1) proper use of layer normalization and 2) a novel way of passing the combination of previous layers to the next. On WMT’16 English-German and NIST OpenMT’12 Chinese-English…

Citation impact

621
total citations
FWCI
42.83
Percentile
100%
References
44
Citations per year

Authors

7

Topics & keywords

Keywords
  • Transformer
  • Machine translation
  • Computer science
  • Deep learning
  • NIST
  • Artificial intelligence
  • Encoder
  • Normalization (sociology)
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.