articleJun 1, 2013Closed access

PPDB: The Paraphrase Database

Johns Hopkins University · University of Pennsylvania · +1 more institution

Abstract

We present the 1.0 release of our paraphrase database, PPDB. Its English portion, PPDB:Eng, contains over 220 million paraphrase pairs, consisting of 73 million phrasal and 8 million lexical paraphrases, as well as 140 million paraphrase patterns, which capture many meaning-preserving syntactic transformations. The paraphrases are extracted from bilingual parallel corpora totaling over 100 million sentence pairs and over 2 billion English words. We also release PPDB:Spa, a collection of 196 million Spanish paraphrases. Each paraphrase pair in PPDB contains a set of associated scores, including paraphrase probabilities derived from the bitext data and a variety of monolingual distributional similarity scores…

Citation impact

656
total citations
FWCI
100.32
Percentile
100%
References
34
Citations per year

Authors

3

Topics & keywords

Keywords
  • Paraphrase
  • Natural language processing
  • Computer science
  • Artificial intelligence
  • Sentence
  • Database
No related works found for this paper.