articlearXiv (Cornell University)Jun 22, 2015GREEN OA

Skip-Thought Vectors

University of Toronto · Canadian Institute for Advanced Research · +1 more institution

Indexed inarxivdatacite

Abstract

We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoder-decoder model that tries to reconstruct the surrounding sentences of an encoded passage. Sentences that share semantic and syntactic properties are thus mapped to similar vector representations. We next introduce a simple vocabulary expansion method to encode words that were not seen as part of training, allowing us to expand our vocabulary to a million words. After training our model, we extract and evaluate our vectors with linear models on 8 tasks: semantic relatedness, paraphrase detection, image-sentence ranking, question-type classification and 4…

Citation impact

726
total citations
FWCI
Percentile
References
38
Citations per year

Authors

7

Topics & keywords

Keywords
  • Computer science
  • Paraphrase
  • Sentence
  • Artificial intelligence
  • Natural language processing
  • Vocabulary
  • Encoder
  • Benchmark (surveying)
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.