Video Captioning With Attention-Based LSTM and Semantic Consistency

Gao, Lianli; Guo, Zhao; Zhang, Hanwang; Xu, Xing; Shen, Heng Tao

doi:10.1109/tmm.2017.2729019

articleIEEE Transactions on MultimediaJul 19, 2017Closed access

Video Captioning With Attention-Based LSTM and Semantic Consistency

LGLianli Gao ZGZhao Guo HZHanwang Zhang XXXing Xu HTHeng Tao Shen

University of Electronic Science and Technology of China · Columbia University

Indexed incrossref

Abstract

Recent progress in using long short-term memory (LSTM) for image captioning has motivated the exploration of their applications for video captioning. By taking a video as a sequence of features, an LSTM model is trained on video-sentence pairs and learns to associate a video to a sentence. However, most existing methods compress an entire video shot or frame into a static representation, without considering attention mechanism which allows for selecting salient features. Furthermore, existing approaches usually model the translating error, but ignore the correlations between sentence semantics and visual content. To tackle these issues, we propose a novel end-to-end framework named aLSTMs, an attention-based…

Citation impact

652

total citations

FWCI: 27.91
Percentile: 100%
References: 71

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Computer science
Closed captioning
Sentence
Artificial intelligence
Feature (linguistics)
Natural language processing
Recurrent neural network
Benchmark (surveying)

No related works found for this paper.