Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics
University of Southern California · Marina Del Rey Hospital
Abstract
In this paper we describe two new objective automatic evaluation methods for machine translation. The first method is based on longest common subsequence between a candidate translation and a set of reference translations. Longest common subsequence takes into account sentence level structure similarity naturally and identifies longest co-occurring in-sequence n-grams automatically. The second method relaxes strict n-gram matching to skip-bigram matching. Skip-bigram is any pair of words in their sentence order. Skip-bigram cooccurrence statistics measure the overlap of skip-bigrams between a candidate translation and a set of reference translations. The empirical results show that both methods correlate with…
Citation impact
- FWCI
- 20.81
- Percentile
- 100%
- References
- 22
Authors
2Topics & keywords
- Bigram
- Computer science
- Machine translation
- Longest common subsequence problem
- Artificial intelligence
- Set (abstract data type)
- Natural language processing
- Subsequence
- Quality Education