articleJun 19, 2011Closed access

Collecting Highly Parallel Data for Paraphrase Evaluation

The University of Texas at Austin · Microsoft (United States)

Abstract

A lack of standard datasets and evaluation metrics has prevented the field of paraphras-ing from making the kind of rapid progress enjoyed by the machine translation commu-nity over the last 15 years. We address both problems by presenting a novel data collection framework that produces highly parallel text data relatively inexpensively and on a large scale. The highly parallel nature of this data allows us to use simple n-gram comparisons to measure both the semantic adequacy and lex-ical dissimilarity of paraphrase candidates. In addition to being simple and efficient to com-pute, experiments show that these metrics cor-relate highly with human judgments. 1

Citation impact

807
total citations
FWCI
28.06
Percentile
100%
References
32
Citations per year

Authors

2

Topics & keywords

Keywords
  • Paraphrase
  • Computer science
  • Simple (philosophy)
  • Machine translation
  • Natural language processing
  • Field (mathematics)
  • Measure (data warehouse)
  • Artificial intelligence
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.