articleAnnals of Surgical Treatment and ResearchJan 1, 2023DIAMOND OA

ChatGPT goes to the operating room: evaluating GPT-4 performance and its potential in surgical education and training in the era of large language models

Samsung Medical Center · Sungkyunkwan University

PubMed
Indexed incrossrefpubmed

Abstract

Methods

The dataset comprised 280 questions from the Korean general surgery board exams conducted between 2020 and 2022. Both GPT-3.5 and GPT-4 models were evaluated, and their performances were compared using McNemar test.

Results

GPT-3.5 achieved an overall accuracy of 46.8%, while GPT-4 demonstrated a significant improvement with an overall accuracy of 76.4%, indicating a notable difference in performance between the models (P

Citation impact

189
total citations
FWCI
6.97
Percentile
100%
References
7
Citations per year

Authors

3

Topics & keywords

Keywords
  • McNemar's test
  • Ranging
  • Medical physics
  • Computer science
  • Medicine
  • Medical education
  • Statistics
  • Mathematics
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.

Funding