articleJun 19, 2011Closed access
Collecting Highly Parallel Data for Paraphrase Evaluation
The University of Texas at Austin · Microsoft (United States)
Abstract
A lack of standard datasets and evaluation metrics has prevented the field of paraphras-ing from making the kind of rapid progress enjoyed by the machine translation commu-nity over the last 15 years. We address both problems by presenting a novel data collection framework that produces highly parallel text data relatively inexpensively and on a large scale. The highly parallel nature of this data allows us to use simple n-gram comparisons to measure both the semantic adequacy and lex-ical dissimilarity of paraphrase candidates. In addition to being simple and efficient to com-pute, experiments show that these metrics cor-relate highly with human judgments. 1
Citation impact
807
total citations
- FWCI
- 28.06
- Percentile
- 100%
- References
- 32
Citations per year
Authors
2Topics & keywords
Topics
Keywords
- Paraphrase
- Computer science
- Simple (philosophy)
- Machine translation
- Natural language processing
- Field (mathematics)
- Measure (data warehouse)
- Artificial intelligence
UN Sustainable Development Goals
- Quality Education
No related works found for this paper.