PPDB: The Paraphrase Database
Johns Hopkins University · University of Pennsylvania · +1 more institution
Abstract
We present the 1.0 release of our paraphrase database, PPDB. Its English portion, PPDB:Eng, contains over 220 million paraphrase pairs, consisting of 73 million phrasal and 8 million lexical paraphrases, as well as 140 million paraphrase patterns, which capture many meaning-preserving syntactic transformations. The paraphrases are extracted from bilingual parallel corpora totaling over 100 million sentence pairs and over 2 billion English words. We also release PPDB:Spa, a collection of 196 million Spanish paraphrases. Each paraphrase pair in PPDB contains a set of associated scores, including paraphrase probabilities derived from the bitext data and a variety of monolingual distributional similarity scores…
Citation impact
- FWCI
- 100.32
- Percentile
- 100%
- References
- 34
Authors
3Topics & keywords
- Paraphrase
- Natural language processing
- Computer science
- Artificial intelligence
- Sentence
- Database