Benchmarking large language models’ performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard

Lim, Zhi Wei; Pushpanathan, Krithi; Yew, Samantha Min Er; Lai, Yien; Sun, Chen‐Hsin; Lam, Janice Sing Harn; Chen, David Ziyou; Goh, Jocelyn Hui Lin; Tan, Marcus Chun Jin; Sheng, Bin; Cheng, Ching‐Yu; Koh, Victor; Tham, Yih Chung

doi:10.1016/j.ebiom.2023.104770

articleEBioMedicineAug 22, 2023GOLD OA

Benchmarking large language models’ performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard

ZWZhi Wei Lim KPKrithi Pushpanathan SMSamantha Min Er Yew YLYien Lai CSChen‐Hsin Sun

National University of Singapore · National University Health System · +4 more institutions

PubMed

Indexed incrossrefdoajpubmed

Abstract

Background

Large language models (LLMs) are garnering wide interest due to their human-like and contextually relevant responses. However, LLMs' accuracy across specific medical domains has yet been thoroughly evaluated. Myopia is a frequent topic which patients and parents commonly seek information online. Our study evaluated the performance of three LLMs namely ChatGPT-3.5, ChatGPT-4.0, and Google Bard, in delivering accurate responses to common myopia-related queries.

Methods

We curated thirty-one commonly asked myopia care-related questions, which were categorised into six domains-pathogenesis, risk factors, clinical presentation, diagnosis, treatment and prevention, and prognosis. Each question was posed to the LLMs, and their responses were independently graded by three consultant-level paediatric ophthalmologists on a three-point accuracy scale (poor, borderline, good). A majority consensus approach was used to determine the final rating for each response. 'Good' rated responses were further evaluated for comprehensiveness on a five-point scale. Conversely, 'poor' rated responses were further prompted for self-correction and then re-evaluated for accuracy.

Citation impact

306

total citations

FWCI: 11.07
Percentile: 100%
References: 55

Citations per year

Authors

13

Topics & keywords

Topics

Keywords

Benchmarking
Scale (ratio)
Point (geometry)
Medicine
Test (biology)
Family medicine
Demography
Geography

UN Sustainable Development Goals

No poverty

No related works found for this paper.