AI versus human-generated multiple-choice questions for medical education: a cohort study in a high-stakes examination

Law, Alex Kwok-Keung; So, Jerome Lok Tsun; Lui, Chun Tat; Choi, Yu Fai; Cheung, Koon Ho; Hung, Kevin Kei Ching; Graham, Colin A.

doi:10.1186/s12909-025-06796-6

articleBMC Medical EducationFeb 8, 2025GOLD OA

AI versus human-generated multiple-choice questions for medical education: a cohort study in a high-stakes examination

AKAlex Kwok-Keung Law JLJerome Lok Tsun So CTChun Tat Lui YFYu Fai Choi KHKoon Ho Cheung

Chinese University of Hong Kong · Hong Kong College of Technology · +2 more institutions

PubMed

Indexed incrossrefdoajpubmed

Abstract

Background

The creation of high-quality multiple-choice questions (MCQs) is essential for medical education assessments but is resource-intensive and time-consuming when done by human experts. Large language models (LLMs) like ChatGPT-4o offer a promising alternative, but their efficacy remains unclear, particularly in high-stakes exams.

Objective

This study aimed to evaluate the quality and psychometric properties of ChatGPT-4o-generated MCQs compared to human-created MCQs in a high-stakes medical licensing exam.

Citation impact

60

total citations

FWCI: 28.72
Percentile: 100%
References: 27

Citations per year

Authors

7

Topics & keywords

Topics

Keywords

Multiple choice
Medical education
Educational measurement
Cohort
Medicine
MEDLINE
Psychology
Curriculum

No related works found for this paper.

Funding

CO
College of Emergency Medicine