Show, Attend and Tell: Neural Image Caption Generation with Visual\n Attention
Indexed inarxiv
Abstract
Inspired by recent work in machine translation and object detection, we\nintroduce an attention based model that automatically learns to describe the\ncontent of images. We describe how we can train this model in a deterministic\nmanner using standard backpropagation techniques and stochastically by\nmaximizing a variational lower bound. We also show through visualization how\nthe model is able to automatically learn to fix its gaze on salient objects\nwhile generating the corresponding words in the output sequence. We validate\nthe use of attention with state-of-the-art performance on three benchmark\ndatasets: Flickr8k, Flickr30k and MS COCO.\n
Citation impact
1,764
total citations
- FWCI
- —
- Percentile
- —
- References
- 38
Citations per year
Authors
8Topics & keywords
Topics
Keywords
- Computer science
- Benchmark (surveying)
- Artificial intelligence
- Gaze
- Visualization
- Object (grammar)
- Salient
- Sequence (biology)
No related works found for this paper.