Multiscale Feature Extraction and Fusion of Image and Text in VQA
University of Electronic Science and Technology of China · Wenzhou University · +3 more institutions
Abstract
Abstract The Visual Question Answering (VQA) system is the process of finding useful information from images related to the question to answer the question correctly. It can be widely used in the fields of visual assistance, automated security surveillance, and intelligent interaction between robots and humans. However, the accuracy of VQA has not been ideal, and the main difficulty in its research is that the image features cannot well represent the scene and object information, and the text information cannot be fully represented. This paper used multi-scale feature extraction and fusion methods in the image feature characterization and text information representation sections of the VQA system, respectively…
Citation impact
- FWCI
- 25.80
- Percentile
- 100%
- References
- 41
Authors
6Topics & keywords
- Computer science
- Artificial intelligence
- Feature extraction
- Pattern recognition (psychology)
- Feature (linguistics)
- Fuse (electrical)
- Representation (politics)
- Image (mathematics)
- Peace, Justice and strong institutions