Jointly Modeling Embedding and Translation to Bridge Video and Language

Pan, Yingwei; Mei, Tao; Yao, Ting; Li, Houqiang; Rui, Yong

doi:10.1109/cvpr.2016.497

articleJun 1, 2016Closed access

Jointly Modeling Embedding and Translation to Bridge Video and Language

YPYingwei Pan TMTao Mei TYTing Yao HLHouqiang Li YRYong Rui

University of Science and Technology of China · Microsoft (United States) · +1 more institution

Indexed incrossref

Abstract

Automatically describing video content with natural language is a fundamental challenge of computer vision. Re-current Neural Networks (RNNs), which models sequence dynamics, has attracted increasing attention on visual interpretation. However, most existing approaches generate a word locally with the given previous words and the visual content, while the relationship between sentence semantics and visual content is not holistically exploited. As a result, the generated sentences may be contextually correct but the semantics (e.g., subjects, verbs or objects) are not true. This paper presents a novel unified framework, named Long Short-Term Memory with visual-semantic Embedding (LSTM-E), which can…

Citation impact

597

total citations

FWCI: 48.32
Percentile: 100%
References: 87

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Computer science
Natural language processing
Artificial intelligence
Sentence
Recurrent neural network
Semantics (computer science)
Embedding
Word embedding

UN Sustainable Development Goals

Quality Education

No related works found for this paper.