Prompting Large Language Models with Answer Heuristics for Knowledge-Based Visual Question Answering
Hangzhou Dianzi University · Hefei University of Technology
Abstract
Knowledge-based visual question answering (VQA) requires external knowledge beyond the image to answer the question. Early studies retrieve required knowledge from explicit knowledge bases (KBs), which often introduces irrelevant information to the question, hence restricting the performance of their models. Recent works have sought to use a large language model (i.e., GPT-3 [3]) as an implicit knowledge engine to acquire the necessary knowledge for answering. Despite the encouraging results achieved by these methods, we argue that they have not fully activated the capacity of GPT-3 as the provided input information is insufficient. In this paper, we present Prophet-a conceptually simple framework designed to…
Citation impact
- FWCI
- 21.45
- Percentile
- 100%
- References
- 69
Authors
4Topics & keywords
- Heuristics
- Question answering
- Computer science
- Task (project management)
- Knowledge extraction
- Artificial intelligence
- Information retrieval
- General knowledge
- Quality Education