Comparison of ChatGPT–3.5, ChatGPT-4, and Orthopaedic Resident Performance on Orthopaedic Assessment Examinations

Massey, Patrick A.; Montgomery, Carver; Zhang, Andrew S.

doi:10.5435/jaaos-d-23-00396

articleJournal of the American Academy of Orthopaedic SurgeonsSep 4, 2023HYBRID OA

Comparison of ChatGPT–3.5, ChatGPT-4, and Orthopaedic Resident Performance on Orthopaedic Assessment Examinations

PAPatrick A. Massey CMCarver Montgomery ASAndrew S. Zhang

Louisiana State University in Shreveport · Louisiana State University Health Sciences Center Shreveport

PubMed

Indexed incrossrefpubmed

Abstract

Introduction

Artificial intelligence (AI) programs have the ability to answer complex queries including medical profession examination questions. The purpose of this study was to compare the performance of orthopaedic residents (ortho residents) against Chat Generative Pretrained Transformer (ChatGPT)-3.5 and GPT-4 on orthopaedic assessment examinations. A secondary objective was to perform a subgroup analysis comparing the performance of each group on questions that included image interpretation versus text-only questions.

Methods

The ResStudy orthopaedic examination question bank was used as the primary source of questions. One hundred eighty questions and answer choices from nine different orthopaedic subspecialties were directly input into ChatGPT-3.5 and then GPT-4. ChatGPT did not have consistently available image interpretation, so no images were directly provided to either AI format. Answers were recorded as correct versus incorrect by the chatbot, and resident performance was recorded based on user data provided by ResStudy.

Citation impact

193

total citations

FWCI: 7.00
Percentile: 100%
References: 21

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Medicine
Subgroup analysis
Interpretation (philosophy)
Orthopedic surgery
Medical physics
Surgery
Internal medicine
Meta-analysis

UN Sustainable Development Goals

Quality Education

No related works found for this paper.