Multiscale Feature Extraction and Fusion of Image and Text in VQA

University of Electronic Science and Technology of China · Wenzhou University · +3 more institutions

Indexed incrossrefdoaj

Abstract

Abstract The Visual Question Answering (VQA) system is the process of finding useful information from images related to the question to answer the question correctly. It can be widely used in the fields of visual assistance, automated security surveillance, and intelligent interaction between robots and humans. However, the accuracy of VQA has not been ideal, and the main difficulty in its research is that the image features cannot well represent the scene and object information, and the text information cannot be fully represented. This paper used multi-scale feature extraction and fusion methods in the image feature characterization and text information representation sections of the VQA system, respectively…

Citation impact

413
total citations
FWCI
25.80
Percentile
100%
References
41
Citations per year

Authors

6

Topics & keywords

Keywords
  • Computer science
  • Artificial intelligence
  • Feature extraction
  • Pattern recognition (psychology)
  • Feature (linguistics)
  • Fuse (electrical)
  • Representation (politics)
  • Image (mathematics)
UN Sustainable Development Goals
  • Peace, Justice and strong institutions
No related works found for this paper.