Delving into LLM-assisted writing in biomedical publications through excess vocabulary
Hertie Institute for Clinical Brain Research · University of Tübingen · +1 more institution
Abstract
Large language models (LLMs) like ChatGPT can generate and revise text with human-level performance. These models come with clear limitations, can produce inaccurate information, and reinforce existing biases. Yet, many scientists use them for their scholarly writing. But how widespread is such LLM usage in the academic literature? To answer this question for the field of biomedical research, we present an unbiased, large-scale approach: We study vocabulary changes in more than 15 million biomedical abstracts from 2010 to 2024 indexed by PubMed and show how the appearance of LLMs led to an abrupt increase in the frequency of certain style words. This excess word analysis suggests that at least 13.5% of 2024…
Citation impact
- FWCI
- 37.34
- Percentile
- 100%
- References
- 47
Authors
4- DKDmitry KobakCorresponding
Hertie Institute for Clinical Brain Research, University of Tübingen
- RGRita González-MárquezCorresponding
Hertie Institute for Clinical Brain Research, University of Tübingen
- EHEmőke-Ágnes HorvátCorresponding
Northwestern University
- JLJan LauseCorresponding
Hertie Institute for Clinical Brain Research, University of Tübingen
Topics & keywords
- Vocabulary
- Coronavirus disease 2019 (COVID-19)
- Pandemic
- English language
- Psychology
- Medicine
- Linguistics
- Mathematics education
- Quality Education