Multiscale Feature Extraction and Fusion of Image and Text in VQA

Lu, Siyu; Ding, Yueming; Liu, Mingzhe; Yin, Zhengtong; Yin, Lirong; Zheng, Wenfeng

doi:10.1007/s44196-023-00233-6

articleInternational Journal of Computational Intelligence SystemsApr 11, 2023GOLD OA

Multiscale Feature Extraction and Fusion of Image and Text in VQA

SLSiyu Lu YDYueming Ding MLMingzhe Liu ZYZhengtong Yin LYLirong Yin

University of Electronic Science and Technology of China · Wenzhou University · +3 more institutions

Indexed incrossrefdoaj

Abstract

Abstract The Visual Question Answering (VQA) system is the process of finding useful information from images related to the question to answer the question correctly. It can be widely used in the fields of visual assistance, automated security surveillance, and intelligent interaction between robots and humans. However, the accuracy of VQA has not been ideal, and the main difficulty in its research is that the image features cannot well represent the scene and object information, and the text information cannot be fully represented. This paper used multi-scale feature extraction and fusion methods in the image feature characterization and text information representation sections of the VQA system, respectively…

Citation impact

413

total citations

FWCI: 25.80
Percentile: 100%
References: 41

Citations per year

Authors

6

Topics & keywords

Topics

Keywords

Computer science
Artificial intelligence
Feature extraction
Pattern recognition (psychology)
Feature (linguistics)
Fuse (electrical)
Representation (politics)
Image (mathematics)

UN Sustainable Development Goals

Peace, Justice and strong institutions

No related works found for this paper.