Large Language Models Are Poor Medical Coders — Benchmarking of Medical Code Querying

Soroush, Ali; Glicksberg, Benjamin S.; Zimlichman, Eyal; Barash, Yiftach; Freeman, Robert; Charney, Alexander W.; Nadkarni, Girish N.; Klang, Eyal

doi:10.1056/aidbp2300040

articleNEJM AIApr 19, 2024BRONZE OA

Large Language Models Are Poor Medical Coders — Benchmarking of Medical Code Querying

ASAli Soroush BSBenjamin S. Glicksberg EZEyal Zimlichman YBYiftach Barash RFRobert Freeman

Icahn School of Medicine at Mount Sinai · Tel Aviv University · +2 more institutions

Indexed incrossref

Abstract

BACKGROUND Large language models (LLMs) have attracted significant interest for automated clinical coding. However, early data show that LLMs are highly error-prone when mapping medical codes. We sought to quantify and benchmark LLM medical code querying errors across several available LLMs.

Citation impact

117

total citations

FWCI: 24.69
Percentile: 100%
References: 14

Citations per year

Authors

AS
Ali SoroushCorresponding
Icahn School of Medicine at Mount Sinai
BS
Benjamin S. Glicksberg
Icahn School of Medicine at Mount Sinai
EZ
Eyal Zimlichman
Tel Aviv University, Sheba Medical Center
YB
Yiftach Barash
Tel Aviv University, Sheba Medical Center
RF
Robert Freeman
Mount Sinai Health System

Topics & keywords

Topics

Biomedical Text Mining and Ontologies100%
Artificial Intelligence in Healthcare99%
Artificial Intelligence in Healthcare and Education99%

Keywords

Benchmarking
Computer science
Code (set theory)
Diagnosis code
Natural language processing
Programming language
Medicine
Business

UN Sustainable Development Goals

No poverty

No related works found for this paper.

Funding

AR
AGA Research Foundation
NI
National Institutes of Health