An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
Microsoft Research (United Kingdom)
Abstract
Knowledge-based visual question answering (VQA) involves answering questions that require external knowledge not present in the image. Existing methods first retrieve knowledge from external resources, then reason over the selected knowledge, the input image, and question for answer prediction. However, this two-step approach could lead to mismatches that potentially limit the VQA performance. For example, the retrieved knowledge might be noisy and irrelevant to the question, and the re-embedded knowledge features during reasoning might deviate from their original meanings in the knowledge base (KB). To address this challenge, we propose PICa, a simple yet effective method that Prompts GPT3 via the use of…
Citation impact
- FWCI
- 14.46
- Percentile
- 100%
- References
- 56
Authors
7Topics & keywords
- Computer science
- Question answering
- Context (archaeology)
- Benchmark (surveying)
- Knowledge base
- Artificial intelligence
- Knowledge extraction
- Information retrieval