Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions

Bernstein, Isaac A.; Zhang, Y; Govil, Devendra; Majid, Iyad; Chang, Robert T.; Sun, Yang; Shue, Ann; Chou, Jonathan; Schehlein, Emily; Christopher, Karen L.; Groth, Sylvia L.; Ludwig, Cassie A.; Wang, Sophia Y.

doi:10.1001/jamanetworkopen.2023.30320

articleJAMA Network OpenAug 22, 2023GOLD OA

Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions

IAIsaac A. Bernstein YZY Zhang DGDevendra Govil IMIyad Majid RTRobert T. Chang

Smith-Kettlewell Eye Research Institute · Stanford University · +4 more institutions

PubMed

Indexed incrossrefdoajpubmed

Abstract

Importance

Large language models (LLMs) like ChatGPT appear capable of performing a variety of tasks, including answering patient eye care questions, but have not yet been evaluated in direct comparison with ophthalmologists. It remains unclear whether LLM-generated advice is accurate, appropriate, and safe for eye patients.

Objective

To evaluate the quality of ophthalmology advice generated by an LLM chatbot in comparison with ophthalmologist-written advice. Design, Setting, and Participants: This cross-sectional study used deidentified data from an online medical forum, in which patient questions received responses written by American Academy of Ophthalmology (AAO)-affiliated ophthalmologists. A masked panel of 8 board-certified ophthalmologists were asked to distinguish between answers generated by the ChatGPT chatbot and human answers. Posts were dated between 2007 and 2016; data were accessed January 2023 and analysis was performed between March and May 2023. Main Outcomes and Measures: Identification of chatbot and human answers on a 4-point scale (likely or definitely artificial intelligence [AI] vs likely or definitely human) and evaluation of responses for presence of incorrect information, alignment with perceived consensus in the medical community, likelihood to cause harm, and extent of harm.

Citation impact

244

total citations

FWCI: 8.83
Percentile: 100%
References: 32

Citations per year

Authors

13

Topics & keywords

Topics

Keywords

Chatbot
Harm
Medicine
Eye care
Medical education
Advice (programming)
Family medicine
Psychology

No related works found for this paper.

Funding

RT
Research to Prevent Blindness
Award: P30-EY026877