ChatGPT goes to the operating room: evaluating GPT-4 performance and its potential in surgical education and training in the era of large language models

Oh, Namkee; Choi, Gyu‐Seong; Lee, Woo Yong

doi:10.4174/astr.2023.104.5.269

articleAnnals of Surgical Treatment and ResearchJan 1, 2023DIAMOND OA

ChatGPT goes to the operating room: evaluating GPT-4 performance and its potential in surgical education and training in the era of large language models

NONamkee Oh GCGyu‐Seong Choi WYWoo Yong Lee

Samsung Medical Center · Sungkyunkwan University

PubMed

Indexed incrossrefpubmed

Abstract

Methods

The dataset comprised 280 questions from the Korean general surgery board exams conducted between 2020 and 2022. Both GPT-3.5 and GPT-4 models were evaluated, and their performances were compared using McNemar test.

Results

GPT-3.5 achieved an overall accuracy of 46.8%, while GPT-4 demonstrated a significant improvement with an overall accuracy of 76.4%, indicating a notable difference in performance between the models (P

Citation impact

189

total citations

FWCI: 6.97
Percentile: 100%
References: 7

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

McNemar's test
Ranging
Medical physics
Computer science
Medicine
Medical education
Statistics
Mathematics

UN Sustainable Development Goals

Quality Education

No related works found for this paper.

Funding

S
Samsung