articleBMC Medical EducationFeb 8, 2025GOLD OA

AI versus human-generated multiple-choice questions for medical education: a cohort study in a high-stakes examination

Chinese University of Hong Kong · Hong Kong College of Technology · +2 more institutions

PubMed
Indexed incrossrefdoajpubmed

Abstract

Background

The creation of high-quality multiple-choice questions (MCQs) is essential for medical education assessments but is resource-intensive and time-consuming when done by human experts. Large language models (LLMs) like ChatGPT-4o offer a promising alternative, but their efficacy remains unclear, particularly in high-stakes exams.

Objective

This study aimed to evaluate the quality and psychometric properties of ChatGPT-4o-generated MCQs compared to human-created MCQs in a high-stakes medical licensing exam.

Citation impact

60
total citations
FWCI
28.72
Percentile
100%
References
27
Citations per year

Authors

7

Topics & keywords

Keywords
  • Multiple choice
  • Medical education
  • Educational measurement
  • Cohort
  • Medicine
  • MEDLINE
  • Psychology
  • Curriculum
No related works found for this paper.

Funding