Medical large language models are vulnerable to data-poisoning attacks

Alber, Daniel Alexander; Yang, Zihao; Alyakin, Anton; Yang, Eunice; Rai, Sumedha; Valliani, Aly; Zhang, Jeff; Rosenbaum, Gabriel R.; Amend-Thomas, Ashley K.; Kurland, David B.; Kremer, C.; Eremiev, Alexander; Negash, Bruck; Wiggan, Daniel D.; Nakatsuka, M.; Sangwon, Karl L.; Neifert, Sean N.; Khan, Hammad A.; Save, Akshay; Palla, Adhith; Grin, Eric A.; Hedman, Monika; Nasir-Moin, Mustafa; Liu, Xujin Chris; Jiang, Lavender Yao; Mankowski, Michal; Segev, Dorry L.; Aphinyanaphongs, Yindalon; Riina, Howard A.; Golfinos, John G.; Orringer, Daniel A.; Kondziolka, Douglas; Oermann, Eric K.

doi:10.1038/s41591-024-03445-1

articleNature MedicineJan 8, 2025HYBRID OA

Medical large language models are vulnerable to data-poisoning attacks

DADaniel Alexander Alber ZYZihao Yang AAAnton Alyakin EYEunice YangSRSumedha Rai

NYU Langone Health · New York University · +3 more institutions

PubMed

Indexed incrossrefpubmed

Abstract

The adoption of large language models (LLMs) in healthcare demands a careful analysis of their potential to spread false medical knowledge. Because LLMs ingest massive volumes of data from the open Internet during training, they are potentially exposed to unverified medical knowledge that may include deliberately planted misinformation. Here, we perform a threat assessment that simulates a data-poisoning attack against The Pile, a popular dataset used for LLM development. We find that replacement of just 0.001% of training tokens with medical misinformation results in harmful models more likely to propagate medical errors. Furthermore, we discover that corrupted models match the performance of their…

Citation impact

147

total citations

FWCI: 69.03
Percentile: 100%
References: 48

Citations per year

Authors

33

DA
Daniel Alexander AlberCorresponding
NYU Langone Health, New York University
ZY
Zihao Yang
NYU Langone Health, New York University
AA
Anton Alyakin
Washington University in St. Louis, NYU Langone Health
EY
Eunice Yang
NYU Langone Health, Columbia University
SR
Sumedha Rai
NYU Langone Health, New York University

Topics & keywords

Topics

Keywords

Misinformation
Computer science
Harm
The Internet
Internet privacy
Computer security
Health care
Data science

No related works found for this paper.