Collecting Highly Parallel Data for Paraphrase Evaluation

Chen, David; Dolan, William B.

articleJun 19, 2011Closed access

Collecting Highly Parallel Data for Paraphrase Evaluation

The University of Texas at Austin · Microsoft (United States)

Abstract

A lack of standard datasets and evaluation metrics has prevented the field of paraphras-ing from making the kind of rapid progress enjoyed by the machine translation commu-nity over the last 15 years. We address both problems by presenting a novel data collection framework that produces highly parallel text data relatively inexpensively and on a large scale. The highly parallel nature of this data allows us to use simple n-gram comparisons to measure both the semantic adequacy and lex-ical dissimilarity of paraphrase candidates. In addition to being simple and efficient to com-pute, experiments show that these metrics cor-relate highly with human judgments. 1

Citation impact

807

total citations

FWCI: 28.06
Percentile: 100%
References: 32

Citations per year

Authors

2

Topics & keywords

Topics

Keywords

Paraphrase
Computer science
Simple (philosophy)
Machine translation
Natural language processing
Field (mathematics)
Measure (data warehouse)
Artificial intelligence

UN Sustainable Development Goals

Quality Education

No related works found for this paper.