articleJan 1, 2016GOLD OA
Sequence-Level Knowledge Distillation
Indexed incrossref
Abstract
Neural machine translation (NMT) offers a novel alternative formulation of translation that is potentially simpler than statistical approaches. However to reach competitive performance, NMT models need to be exceedingly large. In this paper we consider applying knowledge distillation approaches We demonstrate that standard knowledge distillation applied to word-level prediction can be effective for NMT, and also introduce two novel sequence-level versions of knowledge distillation that further improve performance, and somewhat surprisingly, seem to eliminate the need for beam search (even when applied on the original teacher model). Our best student model runs 10 times faster than its state-of-the-art teacher…
Citation impact
774
total citations
- FWCI
- 57.42
- Percentile
- 100%
- References
- 62
Citations per year
Authors
2Topics & keywords
Topics
Keywords
- Distillation
- Pruning
- Computer science
- Beam search
- Machine translation
- Sequence (biology)
- Artificial intelligence
- Baseline (sea)
UN Sustainable Development Goals
- Quality Education
No related works found for this paper.