Assessing the potential of GPT-4 to perpetuate racial and gender biases in health care: a model evaluation study

Zack, Travis; Lehman, Eric; Süzgün, Mirac; Rodriguez, Jorge A.; Celi, Leo Anthony; Gichoya, Judy Wawira; Jurafsky, Dan; Szolovits, Peter; Bates, David W.; Abdulnour, Raja-Elie E.; Butte, Atul J.; Alsentzer, Emily

doi:10.1016/s2589-7500(23)00225-x

articleThe Lancet Digital HealthDec 18, 2023GOLD OA

Assessing the potential of GPT-4 to perpetuate racial and gender biases in health care: a model evaluation study

TZTravis Zack ELEric Lehman MSMirac Süzgün JAJorge A. Rodriguez LALeo Anthony Celi

Massachusetts Institute of Technology · Brigham and Women's Hospital · +2 more institutions

PubMed

Indexed incrossrefdoajpubmed

Abstract

Background

Large language models (LLMs) such as GPT-4 hold great promise as transformative tools in health care, ranging from automating administrative tasks to augmenting clinical decision making. However, these models also pose a danger of perpetuating biases and delivering incorrect medical diagnoses, which can have a direct, harmful impact on medical care. We aimed to assess whether GPT-4 encodes racial and gender biases that impact its use in health care.

Methods

Using the Azure OpenAI application interface, this model evaluation study tested whether GPT-4 encodes racial and gender biases and examined the impact of such biases on four potential applications of LLMs in the clinical domain-namely, medical education, diagnostic reasoning, clinical plan generation, and subjective patient assessment. We conducted experiments with prompts designed to resemble typical use of GPT-4 within clinical and medical education applications. We used clinical vignettes from NEJM Healer and from published research on implicit bias in health care. GPT-4 estimates of the demographic distribution of medical conditions were compared with true US prevalence estimates. Differential diagnosis and treatment planning were evaluated across demographic groups using standard statistical tests for significance between groups.

Citation impact

428

total citations

FWCI: 15.51
Percentile: 100%
References: 52

Citations per year

Authors

12

Topics & keywords

Topics

Keywords

Transformative learning
Health care
Medical diagnosis
Medical care
Psychology
Medicine
Political science
Nursing

UN Sustainable Development Goals

Gender equality

No related works found for this paper.