Mind's eye: A recurrent visual representation for image caption generation

Chen, Xinlei; Zitnick, C. Lawrence

doi:10.1109/cvpr.2015.7298856

articleJun 1, 2015Closed access

Mind's eye: A recurrent visual representation for image caption generation

XCXinlei Chen CLC. Lawrence Zitnick

Carnegie Mellon University · Microsoft (United States)

Indexed incrossref

Abstract

In this paper we explore the bi-directional mapping between images and their sentence-based descriptions. Critical to our approach is a recurrent neural network that attempts to dynamically build a visual representation of the scene as a caption is being generated or read. The representation automatically learns to remember long-term visual concepts. Our model is capable of both generating novel captions given an image, and reconstructing visual features given an image description. We evaluate our approach on several tasks. These include sentence generation, sentence retrieval and image retrieval. State-of-the-art results are shown for the task of generating novel image descriptions. When compared to human…

Citation impact

527

total citations

FWCI: 46.60
Percentile: 100%
References: 75

Citations per year

Authors

2

Topics & keywords

Topics

Keywords

Computer science
Artificial intelligence
Sentence
Representation (politics)
Task (project management)
Image (mathematics)
Visualization
Natural language processing

UN Sustainable Development Goals

Quality Education

No related works found for this paper.

Funding

N
Nvidia