Assessment of a Large Language Model’s Responses to Questions and Cases About Glaucoma and Retina Management

Huang, Andy; Hirabayashi, Kyle; Barna, Laura; Parikh, Deep; Pasquale, Louis R.

doi:10.1001/jamaophthalmol.2023.6917

letterJAMA OphthalmologyFeb 22, 2024HYBRID OA

Assessment of a Large Language Model’s Responses to Questions and Cases About Glaucoma and Retina Management

AHAndy Huang KHKyle Hirabayashi LBLaura Barna DPDeep Parikh LRLouis R. Pasquale

Icahn School of Medicine at Mount Sinai · Massachusetts Eye and Ear Infirmary · +1 more institution

PubMed

Indexed incrossrefpubmed

Abstract

Importance

Large language models (LLMs) are revolutionizing medical diagnosis and treatment, offering unprecedented accuracy and ease surpassing conventional search engines. Their integration into medical assistance programs will become pivotal for ophthalmologists as an adjunct for practicing evidence-based medicine. Therefore, the diagnostic and treatment accuracy of LLM-generated responses compared with fellowship-trained ophthalmologists can help assess their accuracy and validate their potential utility in ophthalmic subspecialties.

Objective

To compare the diagnostic accuracy and comprehensiveness of responses from an LLM chatbot with those of fellowship-trained glaucoma and retina specialists on ophthalmological questions and real patient case management. Design, Setting, and Participants: This comparative cross-sectional study recruited 15 participants aged 31 to 67 years, including 12 attending physicians and 3 senior trainees, from eye clinics affiliated with the Department of Ophthalmology at Icahn School of Medicine at Mount Sinai, New York, New York. Glaucoma and retina questions (10 of each type) were randomly selected from the American Academy of Ophthalmology's commonly asked questions Ask an Ophthalmologist. Deidentified glaucoma and retinal cases (10 of each type) were randomly selected from ophthalmology patients seen at Icahn School of Medicine at Mount Sinai-affiliated clinics. The LLM used was GPT-4 (version dated May 12, 2023). Data were collected from June to August 2023. Main Outcomes and Measures: Responses were assessed via a Likert scale for medical accuracy and completeness. Statistical analysis involved the Mann-Whitney U test and the Kruskal-Wallis test, followed by pairwise comparison.

Citation impact

123

total citations

FWCI: 45.81
Percentile: 100%
References: 11

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Medicine
Glaucoma
Family medicine
Ophthalmology
Test (biology)
Optometry

No related works found for this paper.