A Simple but Tough-to-Beat Baseline for Sentence Embeddings
Abstract
The success of neural network methods for computing word embeddings has motivated methods for generating semantic embeddings of longer pieces of text, such as sentences and paragraphs. Surprisingly, Wieting et al (ICLR'16) showed that such complicated methods are outperformed, especially in out-of-domain (transfer learning) settings, by simpler methods involving mild retraining of word embeddings and basic linear regression. The method of Wieting et al. requires retraining with a substantial labeled dataset such as Paraphrase Database (Ganitkevitch et al., 2013). The current paper goes further, showing that the following completely unsupervised sentence embedding is a formidable baseline: Use word embeddings…
Citation impact
1,053
total citations
- FWCI
- 75.65
- Percentile
- 100%
- References
- 0
Citations per year
Authors
3Topics & keywords
Topics
Keywords
- Computer science
- Artificial intelligence
- Sentence
- Natural language processing
- Word (group theory)
- Paraphrase
- Smoothing
- Simple (philosophy)
UN Sustainable Development Goals
- Quality Education
No related works found for this paper.