Delving into LLM-assisted writing in biomedical publications through excess vocabulary

Kobak, Dmitry; González-Márquez, Rita; Horvát, Emőke-Ágnes; Lause, Jan

doi:10.1126/sciadv.adt3813

articleScience AdvancesJul 2, 2025GOLD OA

Delving into LLM-assisted writing in biomedical publications through excess vocabulary

DKDmitry Kobak RGRita González-Márquez EHEmőke-Ágnes Horvát JLJan Lause

Hertie Institute for Clinical Brain Research · University of Tübingen · +1 more institution

PubMed

Indexed incrossrefdoajpubmed

Abstract

Large language models (LLMs) like ChatGPT can generate and revise text with human-level performance. These models come with clear limitations, can produce inaccurate information, and reinforce existing biases. Yet, many scientists use them for their scholarly writing. But how widespread is such LLM usage in the academic literature? To answer this question for the field of biomedical research, we present an unbiased, large-scale approach: We study vocabulary changes in more than 15 million biomedical abstracts from 2010 to 2024 indexed by PubMed and show how the appearance of LLMs led to an abrupt increase in the frequency of certain style words. This excess word analysis suggests that at least 13.5% of 2024…

Citation impact

78

total citations

FWCI: 37.34
Percentile: 100%
References: 47

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Vocabulary
Coronavirus disease 2019 (COVID-19)
Pandemic
English language
Psychology
Medicine
Linguistics
Mathematics education

UN Sustainable Development Goals

Quality Education

No related works found for this paper.