articleJAMA Network OpenOct 2, 2023GOLD OA

Accuracy and Reliability of Chatbot Responses to Physician Questions

Vanderbilt University · Vanderbilt University Medical Center · +4 more institutions

PubMed
Indexed incrossrefdoajpubmed

Abstract

Importance

Natural language processing tools, such as ChatGPT (generative pretrained transformer, hereafter referred to as chatbot), have the potential to radically enhance the accessibility of medical information for health professionals and patients. Assessing the safety and efficacy of these tools in answering physician-generated questions is critical to determining their suitability in clinical settings, facilitating complex decision-making, and optimizing health care efficiency.

Objective

To assess the accuracy and comprehensiveness of chatbot-generated responses to physician-developed medical queries, highlighting the reliability and limitations of artificial intelligence-generated medical information. Design, Setting, and Participants: Thirty-three physicians across 17 specialties generated 284 medical questions that they subjectively classified as easy, medium, or hard with either binary (yes or no) or descriptive answers. The physicians then graded the chatbot-generated answers to these questions for accuracy (6-point Likert scale with 1 being completely incorrect and 6 being completely correct) and completeness (3-point Likert scale, with 1 being incomplete and 3 being complete plus additional context). Scores were summarized with descriptive statistics and compared using the Mann-Whitney U test or the Kruskal-Wallis test. The study (including data analysis) was conducted from January to May 2023. Main Outcomes and Measures: Accuracy, completeness, and consistency over time and between 2 different versions (GPT-3.5 and GPT-4) of chatbot-generated medical responses.

Citation impact

433
total citations
FWCI
15.66
Percentile
100%
References
12
Citations per year

Authors

35

Topics & keywords

Keywords
  • Chatbot
  • Likert scale
  • Descriptive statistics
  • Context (archaeology)
  • Test (biology)
  • Computer science
  • Reliability (semiconductor)
  • Medicine
UN Sustainable Development Goals
  • Peace, Justice and strong institutions
No related works found for this paper.