Detecting and Preventing Hallucinations in Large Vision Language Models

Gunjal, Anisha; Yin, Jihan; Bas, Erhan

doi:10.1609/aaai.v38i16.29771

articleProceedings of the AAAI Conference on Artificial IntelligenceMar 24, 2024DIAMOND OA

Detecting and Preventing Hallucinations in Large Vision Language Models

AGAnisha Gunjal JYJihan Yin EBErhan Bas

Meso Scale Discovery (United States)

Indexed incrossref

Abstract

Instruction tuned Large Vision Language Models (LVLMs) have significantly advanced in generalizing across a diverse set of multi-modal tasks, especially for Visual Question Answering (VQA). However, generating detailed responses that are visually grounded is still a challenging task for these models. We find that even the current state-of-the-art LVLMs (InstructBLIP) still contain a staggering 30 percent of the hallucinatory text in the form of non-existent objects, unfaithful descriptions, and inaccurate relationships. To address this, we introduce M-HalDetect, a Multimodal Hallucination Detection Dataset that can be used to train and benchmark models for hallucination detection and prevention. M-HalDetect…

Citation impact

122

total citations

FWCI: 62.25
Percentile: 100%
References: 27

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Visual Hallucination
Psychology
Artificial intelligence
Cognitive psychology
Computer science
Computer vision
Psychiatry

UN Sustainable Development Goals

Quality Education

No related works found for this paper.