articleOct 1, 2017Closed access

Multi-modal Factorized Bilinear Pooling with Co-attention Learning for Visual Question Answering

Hangzhou Dianzi University · University of North Carolina at Charlotte · +1 more institution

Indexed incrossref

Abstract

Visual question answering (VQA) is challenging because it requires a simultaneous understanding of both the visual content of images and the textual content of questions. The approaches used to represent the images and questions in a fine-grained manner and questions and to fuse these multimodal features play key roles in performance. Bilinear pooling based models have been shown to outperform traditional linear models for VQA, but their high-dimensional representations and high computational complexity may seriously limit their applicability in practice. For multimodal feature fusion, here we develop a Multi-modal Factorized Bilinear (MFB) pooling approach to efficiently and effectively combine multi-modal…

Citation impact

716
total citations
FWCI
21.35
Percentile
100%
References
54
Citations per year

Authors

4

Topics & keywords

Keywords
  • Pooling
  • Bilinear interpolation
  • Computer science
  • Artificial intelligence
  • Feature (linguistics)
  • Question answering
  • Modal
  • Representation (politics)
No related works found for this paper.