Grounded Compositional Semantics for Finding and Describing Images with Sentences

Socher, Richard; Karpathy, Andrej; Le, Quoc V.; Manning, Christopher D.; Ng, Andrew Y.

doi:10.1162/tacl_a_00177

articleTransactions of the Association for Computational LinguisticsDec 1, 2014DIAMOND OA

Grounded Compositional Semantics for Finding and Describing Images with Sentences

RSRichard Socher AKAndrej Karpathy QVQuoc V. Le CDChristopher D. Manning AYAndrew Y. Ng

Laboratoire d'Informatique de Paris-Nord · Stanford University · +1 more institution

Indexed incrossrefdoaj

Abstract

Previous work on Recursive Neural Networks (RNNs) shows that these models can produce compositional feature vectors for accurately representing and classifying sentences or images. However, the sentence vectors of previous models cannot accurately represent visually grounded meaning. We introduce the DT-RNN model which uses dependency trees to embed sentences into a vector space in order to retrieve images that are described by those sentences. Unlike previous RNN-based models which use constituency trees, DT-RNNs naturally focus on the action and agents in a sentence. They are better able to abstract from the details of word order and syntactic expression. DT-RNNs outperform other recursive and recurrent…

Citation impact

827

total citations

FWCI: 84.03
Percentile: 100%
References: 42

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Computer science
Sentence
Recurrent neural network
Artificial intelligence
Natural language processing
Dependency (UML)
Word (group theory)
Feature (linguistics)

UN Sustainable Development Goals

Quality Education

No related works found for this paper.