articleIEEE Transactions on MultimediaJul 19, 2017Closed access

Video Captioning With Attention-Based LSTM and Semantic Consistency

University of Electronic Science and Technology of China · Columbia University

Indexed incrossref

Abstract

Recent progress in using long short-term memory (LSTM) for image captioning has motivated the exploration of their applications for video captioning. By taking a video as a sequence of features, an LSTM model is trained on video-sentence pairs and learns to associate a video to a sentence. However, most existing methods compress an entire video shot or frame into a static representation, without considering attention mechanism which allows for selecting salient features. Furthermore, existing approaches usually model the translating error, but ignore the correlations between sentence semantics and visual content. To tackle these issues, we propose a novel end-to-end framework named aLSTMs, an attention-based…

No related works found for this paper.

Funding