Deep Modular Co-Attention Networks for Visual Question Answering

Zhou, Yu; Yu, Jun; Cui, Yuhao; Tao, Dacheng; Tian, Qi

doi:10.1109/cvpr.2019.00644

articleJun 1, 2019Closed access

Deep Modular Co-Attention Networks for Visual Question Answering

YZYu Zhou JYJun Yu YCYuhao Cui DTDacheng Tao QTQi Tian

Hangzhou Dianzi University · Hồng Đức University · +4 more institutions

Indexed incrossref

Abstract

Visual Question Answering (VQA) requires a fine-grained and simultaneous understanding of both the visual content of images and the textual content of questions. Therefore, designing an effective `co-attention' model to associate key words in questions with key objects in images is central to VQA performance. So far, most successful attempts at co-attention learning have been achieved by using shallow models, and deep co-attention models show little improvement over their shallow counterparts. In this paper, we propose a deep Modular Co-Attention Network (MCAN) that consists of Modular Co-Attention (MCA) layers cascaded in depth. Each MCA layer models the self-attention of questions and images, as well as the…

Citation impact

920

total citations

FWCI: 44.50
Percentile: 100%
References: 54

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Modular design
Computer science
Question answering
Benchmark (surveying)
Attention network
Key (lock)
Artificial intelligence
Deep learning

UN Sustainable Development Goals

Quality Education

No related works found for this paper.