articleInternational Conference on Learning RepresentationsApr 24, 2017Closed access

A Simple but Tough-to-Beat Baseline for Sentence Embeddings

Princeton University

Abstract

The success of neural network methods for computing word embeddings has motivated methods for generating semantic embeddings of longer pieces of text, such as sentences and paragraphs. Surprisingly, Wieting et al (ICLR'16) showed that such complicated methods are outperformed, especially in out-of-domain (transfer learning) settings, by simpler methods involving mild retraining of word embeddings and basic linear regression. The method of Wieting et al. requires retraining with a substantial labeled dataset such as Paraphrase Database (Ganitkevitch et al., 2013). The current paper goes further, showing that the following completely unsupervised sentence embedding is a formidable baseline: Use word embeddings…

Citation impact

1,053
total citations
FWCI
75.65
Percentile
100%
References
0
Citations per year

Authors

3

Topics & keywords

Keywords
  • Computer science
  • Artificial intelligence
  • Sentence
  • Natural language processing
  • Word (group theory)
  • Paraphrase
  • Smoothing
  • Simple (philosophy)
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.