Learning Deep Transformer Models for Machine Translation

Wang, Qiang; Li, Bei; Xiao, Tong; Zhu, Jingbo; Li, Changliang; Wong, Derek F.; Chao, Lidia S.

doi:10.18653/v1/p19-1176

preprintJan 1, 2019GOLD OA

Learning Deep Transformer Models for Machine Translation

QWQiang Wang BLBei Li TXTong Xiao JZJingbo Zhu CLChangliang Li

Northeastern University · University of Macau · +1 more institution

Indexed incrossref

Abstract

Transformer is the state-of-the-art model in recent machine translation evaluations. Two strands of research are promising to improve models of this kind: the first uses wide networks (a.k.a. Transformer-Big) and has been the de facto standard for development of the Transformer system, and the other uses deeper language representation but faces the difficulty arising from learning deep networks. Here, we continue the line of research on the latter. We claim that a truly deep Transformer model can surpass the Transformer-Big counterpart by 1) proper use of layer normalization and 2) a novel way of passing the combination of previous layers to the next. On WMT’16 English-German and NIST OpenMT’12 Chinese-English…

Citation impact

621

total citations

FWCI: 42.83
Percentile: 100%
References: 44

Citations per year

Authors

7

Topics & keywords

Topics

Keywords

Transformer
Machine translation
Computer science
Deep learning
NIST
Artificial intelligence
Encoder
Normalization (sociology)

UN Sustainable Development Goals

Quality Education

No related works found for this paper.