Learning Deep Transformer Models for Machine Translation
Northeastern University · University of Macau · +1 more institution
Abstract
Transformer is the state-of-the-art model in recent machine translation evaluations. Two strands of research are promising to improve models of this kind: the first uses wide networks (a.k.a. Transformer-Big) and has been the de facto standard for development of the Transformer system, and the other uses deeper language representation but faces the difficulty arising from learning deep networks. Here, we continue the line of research on the latter. We claim that a truly deep Transformer model can surpass the Transformer-Big counterpart by 1) proper use of layer normalization and 2) a novel way of passing the combination of previous layers to the next. On WMT’16 English-German and NIST OpenMT’12 Chinese-English…
Citation impact
- FWCI
- 42.83
- Percentile
- 100%
- References
- 44
Authors
7Topics & keywords
- Transformer
- Machine translation
- Computer science
- Deep learning
- NIST
- Artificial intelligence
- Encoder
- Normalization (sociology)
- Quality Education