preprintarXiv (Cornell University)Oct 21, 2020GREEN OA

Beyond English-Centric Multilingual Machine Translation

Meta (Israel)

Indexed inarxivdatacite

Abstract

Existing work in translation demonstrated the potential of massively multilingual machine translation by training a single model able to translate between any pair of languages. However, much of this work is English-Centric by training only on data which was translated from or to English. While this is supported by large sources of training data, it does not reflect translation needs worldwide. In this work, we create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages. We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining. Then, we explore how to effectively…

Citation impact

468
total citations
FWCI
Percentile
References
81
Citations per year

Authors

17

Topics & keywords

Keywords
  • Machine translation
  • Translation (biology)
  • Computer science
  • Natural language processing
  • Artificial intelligence
  • Linguistics
  • Philosophy
  • Chemistry
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.