articleJan 1, 2014GOLD OA

Improving Vector Space Word Representations Using Multilingual Correlation

Carnegie Mellon University

Indexed incrossrefdatacite

Abstract

The distributional hypothesis of Harris (1954), according to which the meaning of words is evidenced by the contexts they occur in, has motivated several effective techniques for obtaining vector space semantic representations of words using unannotated text corpora. This paper argues that lexico-semantic content should additionally be invariant across languages and proposes a simple technique based on canonical correlation analysis (CCA) for incorporating multilingual evidence into vectors generated monolingually. We evaluate the resulting word representations on standard lexical semantic evaluation tasks and show that our method produces substantially better semantic representations than monolingual…

Citation impact

652
total citations
FWCI
79.91
Percentile
100%
References
43
Citations per year

Authors

2

Topics & keywords

Keywords
  • Word (group theory)
  • Computer science
  • Vector space
  • Space (punctuation)
  • Natural language processing
  • Artificial intelligence
  • Correlation
  • Speech recognition
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.