SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning

Chen, Long; Zhang, Hanwang; Xiao, Jun; Nie, Liqiang; Shao, Jian; Liu, Wei; Chua, Tat‐Seng

doi:10.1109/cvpr.2017.667

articleJul 1, 2017Closed access

SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning

LCLong Chen HZHanwang Zhang JXJun Xiao LNLiqiang Nie JSJian Shao

Zhejiang University · Columbia University · +3 more institutions

Indexed incrossref

Abstract

Visual attention has been successfully applied in structural prediction tasks such as visual captioning and question answering. Existing visual attention models are generally spatial, i.e., the attention is modeled as spatial probabilities that re-weight the last conv-layer feature map of a CNN encoding an input image. However, we argue that such spatial attention does not necessarily conform to the attention mechanism - a dynamic feature extractor that combines contextual fixations over time, as CNN features are naturally spatial, channel-wise and multi-layer. In this paper, we introduce a novel convolutional neural network dubbed SCA-CNN that incorporates Spatial and Channel-wise Attentions in a CNN. In the…

Citation impact

2,059

total citations

FWCI: 70.23
Percentile: 100%
References: 68

Citations per year

Authors

7

Topics & keywords

Topics

Keywords

Closed captioning
Computer science
Convolutional neural network
Artificial intelligence
Feature (linguistics)
Context (archaeology)
Encoding (memory)
Sentence

UN Sustainable Development Goals

Quality Education

No related works found for this paper.