Video Captioning With Attention-Based LSTM and Semantic Consistency
University of Electronic Science and Technology of China · Columbia University
Abstract
Recent progress in using long short-term memory (LSTM) for image captioning has motivated the exploration of their applications for video captioning. By taking a video as a sequence of features, an LSTM model is trained on video-sentence pairs and learns to associate a video to a sentence. However, most existing methods compress an entire video shot or frame into a static representation, without considering attention mechanism which allows for selecting salient features. Furthermore, existing approaches usually model the translating error, but ignore the correlations between sentence semantics and visual content. To tackle these issues, we propose a novel end-to-end framework named aLSTMs, an attention-based…
Citation impact
- FWCI
- 27.91
- Percentile
- 100%
- References
- 71
Authors
5- LGLianli GaoCorresponding
University of Electronic Science and Technology of China
- ZGZhao Guo
University of Electronic Science and Technology of China
- HZHanwang Zhang
Columbia University
- XXXing Xu
University of Electronic Science and Technology of China
- HTHeng Tao Shen
University of Electronic Science and Technology of China
Topics & keywords
- Computer science
- Closed captioning
- Sentence
- Artificial intelligence
- Feature (linguistics)
- Natural language processing
- Recurrent neural network
- Benchmark (surveying)