articleJun 1, 2020Closed access

X-Linear Attention Networks for Image Captioning

Jingdong (China)

Indexed incrossref

Abstract

Recent progress on fine-grained visual recognition and visual question answering has featured Bilinear Pooling, which effectively models the 2nd order interactions across multi-modal inputs. Nevertheless, there has not been evidence in support of building such interactions concurrently with attention mechanism for image captioning. In this paper, we introduce a unified attention block - X-Linear attention block, that fully employs bilinear pooling to selectively capitalize on visual information or perform multi-modal reasoning. Technically, X-Linear attention block simultaneously exploits both the spatial and channel-wise bilinear attention distributions to capture the 2 nd order interactions between the input…

Citation impact

685
total citations
FWCI
42.37
Percentile
100%
References
67
Citations per year

Authors

4

Topics & keywords

Keywords
  • Closed captioning
  • Pooling
  • Computer science
  • Bilinear interpolation
  • Block (permutation group theory)
  • Sentence
  • Artificial intelligence
  • Theoretical computer science
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.