Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks
Purdue University West Lafayette · Baidu (China) · +1 more institution
Abstract
We present an approach that exploits hierarchical Recurrent Neural Networks (RNNs) to tackle the video captioning problem, i.e., generating one or multiple sentences to describe a realistic video. Our hierarchical framework contains a sentence generator and a paragraph generator. The sentence generator produces one simple short sentence that describes a specific short video interval. It exploits both temporal-and spatial-attention mechanisms to selectively focus on visual elements during generation. The paragraph generator captures the inter-sentence dependency by taking as input the sentential embedding produced by the sentence generator, combining it with the paragraph history, and outputting the new initial…
Citation impact
- FWCI
- 48.15
- Percentile
- 100%
- References
- 85
Authors
5Topics & keywords
- Computer science
- Sentence
- Paragraph
- Closed captioning
- Generator (circuit theory)
- Benchmark (surveying)
- Artificial intelligence
- Exploit