articleJun 1, 2020Closed access
X-Linear Attention Networks for Image Captioning
Indexed incrossref
Abstract
Recent progress on fine-grained visual recognition and visual question answering has featured Bilinear Pooling, which effectively models the 2nd order interactions across multi-modal inputs. Nevertheless, there has not been evidence in support of building such interactions concurrently with attention mechanism for image captioning. In this paper, we introduce a unified attention block - X-Linear attention block, that fully employs bilinear pooling to selectively capitalize on visual information or perform multi-modal reasoning. Technically, X-Linear attention block simultaneously exploits both the spatial and channel-wise bilinear attention distributions to capture the 2 nd order interactions between the input…
Citation impact
685
total citations
- FWCI
- 42.37
- Percentile
- 100%
- References
- 67
Citations per year
Authors
4Topics & keywords
Topics
Keywords
- Closed captioning
- Pooling
- Computer science
- Bilinear interpolation
- Block (permutation group theory)
- Sentence
- Artificial intelligence
- Theoretical computer science
UN Sustainable Development Goals
- Quality Education
No related works found for this paper.