articleJournal of Clinical EpidemiologyFeb 23, 2026HYBRID OA

Quantifying new threats to health and biomedical literature integrity from rapidly scaled publications and problematic research

University of Surrey · Aberystwyth University · +3 more institutions

PubMed
Indexed incrossrefpubmed

Abstract

Methods

Here we use a scientometric analysis to investigate which datasets have seen publication rates deviate from previous trends, especially where this coincides with changes to author geographical origins and increases in formulaic titles.

Results

Across 36 datasets, we identify nine showing hallmarks of paper mill exploitation (FDA Adverse Event Reporting System, National Health And Nutrition Examination Survey, UK Biobank, FinnGen, the Global Burden of Disease Study, Medical Information Mart for Intensive Care, China Health and Retirement Longitudinal Study, Centers for Disease Control and Prevention Wide-ranging Online Data for Epidemiologic Research, and TriNetX). These nine datasets had, in 2025, a combined publication count of 23,005 indexed in the OpenAlex database. This represents an excess of 11,577 publications above the AutoRegressive Integrated Moving Average forecast trend, and is a 3.0×-fold change on the 7655 publication count for these nine datasets in 2022. We also identified a notable difference in the fold change for China (4.2×) vs. the rest of the world (1.9×) and an increase in formulaic titles.

No related works found for this paper.

Funding