articlenpj Digital MedicineMay 13, 2025GOLD OA

A framework to assess clinical safety and hallucination rates of LLMs for medical text summarisation

Nicolaus Copernicus University

PubMed
Indexed incrossrefdoajpubmed

Abstract

Integrating large language models (LLMs) into healthcare can enhance workflow efficiency and patient care by automating tasks such as summarising consultations. However, the fidelity between LLM outputs and ground truth information is vital to prevent miscommunication that could lead to compromise in patient safety. We propose a framework comprising (1) an error taxonomy for classifying LLM outputs, (2) an experimental structure for iterative comparisons in our LLM document generation pipeline, (3) a clinical safety framework to evaluate the harms of errors, and (4) a graphical user interface, CREOLA, to facilitate these processes. Our clinical error metrics were derived from 18 experimental configurations…

No related works found for this paper.