articleJun 1, 2019Closed access

Deep Modular Co-Attention Networks for Visual Question Answering

Hangzhou Dianzi University · Hồng Đức University · +4 more institutions

Indexed incrossref

Abstract

Visual Question Answering (VQA) requires a fine-grained and simultaneous understanding of both the visual content of images and the textual content of questions. Therefore, designing an effective `co-attention' model to associate key words in questions with key objects in images is central to VQA performance. So far, most successful attempts at co-attention learning have been achieved by using shallow models, and deep co-attention models show little improvement over their shallow counterparts. In this paper, we propose a deep Modular Co-Attention Network (MCAN) that consists of Modular Co-Attention (MCA) layers cascaded in depth. Each MCA layer models the self-attention of questions and images, as well as the…

Citation impact

920
total citations
FWCI
44.50
Percentile
100%
References
54
Citations per year

Authors

5

Topics & keywords

Keywords
  • Modular design
  • Computer science
  • Question answering
  • Benchmark (surveying)
  • Attention network
  • Key (lock)
  • Artificial intelligence
  • Deep learning
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.