Reliability of LLMs as medical assistants for the general public: a randomized preregistered study

Bean, Andrew M.; Payne, Rebecca; Parsons, Guy; Kirk, Hannah Rose; Ciro, Juan; Mosquera-Gómez, Rafael; M, Sara Hincapié; Ekanayaka, Aruna S.; Tarassenko, Lionel; Rocher, Luc; Mahdi, Adam

doi:10.1038/s41591-025-04074-y

articleNature MedicineFeb 1, 2026HYBRID OA

Reliability of LLMs as medical assistants for the general public: a randomized preregistered study

AMAndrew M. Bean RPRebecca Payne GPGuy Parsons HRHannah Rose KirkJCJuan Ciro

University of Oxford · Betsi Cadwaladr University Health Board · +6 more institutions

PubMed

Indexed incrossrefpubmed

Abstract

Global healthcare providers are exploring the use of large language models (LLMs) to provide medical advice to the public. LLMs now achieve nearly perfect scores on medical licensing exams, but this does not necessarily translate to accurate performance in real-world settings. We tested whether LLMs can assist members of the public in identifying underlying conditions and choosing a course of action (disposition) in ten medical scenarios in a controlled study with 1,298 participants. Participants were randomly assigned to receive assistance from an LLM (GPT-4o, Llama 3, Command R+) or a source of their choice (control). Tested alone, LLMs complete the scenarios accurately, correctly identifying conditions in…

Citation impact

32

total citations

FWCI: 268.69
Percentile: 100%
References: 25

Too recent for citation history.

Authors

11

AM
Andrew M. Bean
University of Oxford
RP
Rebecca Payne
Betsi Cadwaladr University Health Board, Bangor University, University of Oxford
GP
Guy Parsons
National Health Service, University of Oxford
HR
Hannah Rose Kirk
University of Oxford
JC
Juan Ciro
Contextual Change (United States)

Topics & keywords

Topics

Keywords

Reliability (semiconductor)
Control (management)
Action (physics)
Disposition
Medical advice
Software deployment
Public health
Health care

No related works found for this paper.