Large language models for data extraction from unstructured and semi-structured electronic health records: a multiple model performance evaluation

Ntinopoulos, Vasileios; Biefer, Héctor Rodríguez Cetina; Tudorache, I.; Papadopoulos, Nestoras; Odavic, Dragan; Risteski, Petar; Haeussler, Achim; Dzemali, Omer

doi:10.1136/bmjhci-2024-101139

articleBMJ Health & Care InformaticsJan 1, 2025GOLD OA

Large language models for data extraction from unstructured and semi-structured electronic health records: a multiple model performance evaluation

VNVasileios Ntinopoulos HRHéctor Rodríguez Cetina Biefer ITI. Tudorache NPNestoras Papadopoulos DODragan Odavic

Triemli Hospital · University Hospital of Zurich

PubMed

Indexed incrossrefdoajpubmed

Abstract

Objectives

We aimed to evaluate the performance of multiple large language models (LLMs) in data extraction from unstructured and semi-structured electronic health records.

Methods

50 synthetic medical notes in English, containing a structured and an unstructured part, were drafted and evaluated by domain experts, and subsequently used for LLM-prompting. 18 LLMs were evaluated against a baseline transformer-based model. Performance assessment comprised four entity extraction and five binary classification tasks with a total of 450 predictions for each LLM. LLM-response consistency assessment was performed over three same-prompt iterations.

Citation impact

50

total citations

FWCI: 94.57
Percentile: 100%
References: 19

Citations per year

Authors

8

Topics & keywords

Topics

Keywords

Unstructured data
Health records
Computer science
Language model
Data extraction
Natural language processing
Data mining
Data science

No related works found for this paper.