Benchmarking Retrieval-Augmented Generation for Medicine

Xiong, Guangzhi; Jin, Qiao; Lu, Zhiyong; Zhang, Aidong

doi:10.18653/v1/2024.findings-acl.372

articleJan 1, 2024GOLD OA

Benchmarking Retrieval-Augmented Generation for Medicine

GXGuangzhi Xiong QJQiao Jin ZLZhiyong Lu AZAidong Zhang

Indexed incrossref

Abstract

While large language models (LLMs) have achieved state-of-the-art performance on a wide range of medical question answering (QA) tasks, they still face challenges with hallucinations and outdated knowledge.Retrievalaugmented generation (RAG) is a promising solution and has been widely adopted.However, a RAG system can involve multiple flexible components, and there is a lack of best practices regarding the optimal RAG setting for various medical purposes.To systematically evaluate such systems, we propose the Medical Information Retrieval-Augmented Generation Evaluation (MIRAGE), a first-of-its-kind benchmark including 7,663 questions from five medical QA datasets.Using MIRAGE, we conducted large-scale…

Citation impact

197

total citations

FWCI: 61.74
Percentile: 100%
References: 0

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Benchmarking
Computer science
Information retrieval
Artificial intelligence
Natural language processing
Business

No related works found for this paper.