Multi-modal Factorized Bilinear Pooling with Co-attention Learning for Visual Question Answering

Zhou, Yu; Yu, Jun; Fan, Jianping; Tao, Dacheng

doi:10.1109/iccv.2017.202

articleOct 1, 2017Closed access

Multi-modal Factorized Bilinear Pooling with Co-attention Learning for Visual Question Answering

YZYu Zhou JYJun Yu JFJianping Fan DTDacheng Tao

Hangzhou Dianzi University · University of North Carolina at Charlotte · +1 more institution

Indexed incrossref

Abstract

Visual question answering (VQA) is challenging because it requires a simultaneous understanding of both the visual content of images and the textual content of questions. The approaches used to represent the images and questions in a fine-grained manner and questions and to fuse these multimodal features play key roles in performance. Bilinear pooling based models have been shown to outperform traditional linear models for VQA, but their high-dimensional representations and high computational complexity may seriously limit their applicability in practice. For multimodal feature fusion, here we develop a Multi-modal Factorized Bilinear (MFB) pooling approach to efficiently and effectively combine multi-modal…

Citation impact

716

total citations

FWCI: 21.35
Percentile: 100%
References: 54

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Pooling
Bilinear interpolation
Computer science
Artificial intelligence
Feature (linguistics)
Question answering
Modal
Representation (politics)

No related works found for this paper.