Synthesizing scientific literature with retrieval-augmented language models

Asai, Akari; He, Jacqueline; Shao, Rulin; Shi, Weijia; Singh, Amanpreet; Chang, Joseph Chee; Lo, Kyle Shih-Huang; Soldaini, Luca; Feldman, Sergey; D’Arcy, Mike; Wadden, David; Latzke, Matt; Sparks, Jenna; Hwang, Jena D.; Kishore, Varsha; Tian, Minyang; Ji, Pan; Liu, Shengyan; Tong, Hao; Wu, Bohao; Xiong, Yanyu; Zettlemoyer, Luke; Neubig, Graham; Weld, Daniel S.; Downey, Doug; Yih, Wen-tau; Koh, Pang Wei; Hajishirzi, Hannaneh

doi:10.1038/s41586-025-10072-4

articleNatureFeb 4, 2026HYBRID OA

Synthesizing scientific literature with retrieval-augmented language models

AAAkari Asai JHJacqueline He RSRulin Shao WSWeijia Shi ASAmanpreet Singh

University of Washington · Allen Institute · +4 more institutions

PubMed

Indexed incrossrefpubmed

Abstract

Scientific progress depends on the ability of researchers to synthesize the growing body of literature. Can large language models (LLMs) assist scientists in this task? Here we introduce OpenScholar, a specialized retrieval-augmented language model (LM)1 that answers scientific queries by identifying relevant passages from 45 million open-access papers and synthesizing citation-backed responses. To evaluate OpenScholar, we develop ScholarQABench, the first large-scale multi-domain benchmark for literature search, comprising 2,967 expert-written queries and 208 long-form answers across computer science, physics, neuroscience and biomedicine. Despite being a smaller open model, OpenScholar-8B outperforms GPT-4o…

Citation impact

9

total citations

FWCI: 74.17
Percentile: 100%
References: 44

Citations per year

Authors

28

Topics & keywords

Topics

Keywords

Correctness
Inference
Benchmark (surveying)
Language model
Task (project management)
Citation

UN Sustainable Development Goals

Quality Education

No related works found for this paper.