Europarl: A Parallel Corpus for Statistical Machine Translation

Koehn, Philipp

articleSep 13, 2005Closed access

Europarl: A Parallel Corpus for Statistical Machine Translation

Abstract

We collected a corpus of parallel text in 11 languages from the proceedings of the European Parliament, which are published on the web 1. This corpus has found widespread use in the NLP community. Here, we focus on its acquisition and its application as training data for statistical machine translation (SMT). We trained SMT systems for 110 language pairs, which reveal interesting clues into the challenges ahead.

Citation impact

3,110

total citations

FWCI: 84.51
Percentile: 100%
References: 9

Citations per year

Authors

1

PK
Philipp KoehnCorresponding

Topics & keywords

Topics

Keywords

Machine translation
Computer science
Natural language processing
Parallel corpora
Focus (optics)
Artificial intelligence
Example-based machine translation
Computer-assisted translation

No related works found for this paper.