A framework to assess clinical safety and hallucination rates of LLMs for medical text summarisation

Asgari, Elham; Montaña-Brown, Nina; Dubois, Magda; Khalil, Saleh; Balloch, Jasmine; Yeung, Joshua Au; Pimenta, Dominic

doi:10.1038/s41746-025-01670-7

articlenpj Digital MedicineMay 13, 2025GOLD OA

A framework to assess clinical safety and hallucination rates of LLMs for medical text summarisation

EAElham Asgari NMNina Montaña-Brown MDMagda Dubois SKSaleh Khalil JBJasmine Balloch

Nicolaus Copernicus University

PubMed

Indexed incrossrefdoajpubmed

Abstract

Integrating large language models (LLMs) into healthcare can enhance workflow efficiency and patient care by automating tasks such as summarising consultations. However, the fidelity between LLM outputs and ground truth information is vital to prevent miscommunication that could lead to compromise in patient safety. We propose a framework comprising (1) an error taxonomy for classifying LLM outputs, (2) an experimental structure for iterative comparisons in our LLM document generation pipeline, (3) a clinical safety framework to evaluate the harms of errors, and (4) a graphical user interface, CREOLA, to facilitate these processes. Our clinical error metrics were derived from 18 experimental configurations…

Citation impact

188

total citations

FWCI: 114.00
Percentile: 100%
References: 53

Citations per year

Authors

7

Topics & keywords

Topics

Keywords

Psychology
Psychiatry

No related works found for this paper.