ChatGPT goes to the operating room: evaluating GPT-4 performance and its potential in surgical education and training in the era of large language models
Samsung Medical Center · Sungkyunkwan University
Indexed incrossrefpubmed
Abstract
Methods
The dataset comprised 280 questions from the Korean general surgery board exams conducted between 2020 and 2022. Both GPT-3.5 and GPT-4 models were evaluated, and their performances were compared using McNemar test.
Results
GPT-3.5 achieved an overall accuracy of 46.8%, while GPT-4 demonstrated a significant improvement with an overall accuracy of 76.4%, indicating a notable difference in performance between the models (P
Citation impact
189
total citations
- FWCI
- 6.97
- Percentile
- 100%
- References
- 7
Citations per year
Authors
3Topics & keywords
Topics
Keywords
- McNemar's test
- Ranging
- Medical physics
- Computer science
- Medicine
- Medical education
- Statistics
- Mathematics
UN Sustainable Development Goals
- Quality Education
No related works found for this paper.