Benchmarking Large Language Models in Retrieval-Augmented Generation

Chen, Jiawei; Lin, Hongyu; Han, Xianpei; Sun, Le

doi:10.1609/aaai.v38i16.29728

articleProceedings of the AAAI Conference on Artificial IntelligenceMar 24, 2024DIAMOND OA

Benchmarking Large Language Models in Retrieval-Augmented Generation

JCJiawei Chen HLHongyu Lin XHXianpei Han LSLe Sun

Institute of Software · University of Chinese Academy of Sciences · +2 more institutions

Indexed incrossref

Abstract

Retrieval-Augmented Generation (RAG) is a promising approach for mitigating the hallucination of large language models (LLMs). However, existing research lacks rigorous evaluation of the impact of retrieval-augmented generation on different large language models, which make it challenging to identify the potential bottlenecks in the capabilities of RAG for different LLMs. In this paper, we systematically investigate the impact of Retrieval-Augmented Generation on large language models. We analyze the performance of different large language models in 4 fundamental abilities required for RAG, including noise robustness, negative rejection, information integration, and counterfactual robustness. To this end, we…

Citation impact

311

total citations

FWCI: 41.90
Percentile: 100%
References: 50

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Benchmarking
Computer science
Natural language processing
Information retrieval
Artificial intelligence
Business

UN Sustainable Development Goals

Quality Education

No related works found for this paper.

Funding

NN
National Natural Science Foundation of China
Award: YSBR-040