AI models collapse when trained on recursively generated data

Shumailov, Ilia; Shumaylov, Zakhar; Zhao, Yiren; Papernot, Nicolas; Anderson, Ross; Gal, Yarin

doi:10.1038/s41586-024-07566-y

articleNatureJul 24, 2024HYBRID OA

AI models collapse when trained on recursively generated data

ISIlia Shumailov ZSZakhar Shumaylov YZYiren Zhao NPNicolas Papernot RARoss Anderson

University of Oxford · University of Cambridge · +5 more institutions

PubMed

Indexed incrossrefpubmed

Abstract

) demonstrated high performance across a variety of language tasks. ChatGPT introduced such language models to the public. It is now clear that generative artificial intelligence (AI) such as large language models (LLMs) is here to stay and will substantially change the ecosystem of online text and images. Here we consider what may happen to GPT-{n} once LLMs contribute much of the text found online. We find that indiscriminate use of model-generated content in training causes irreversible defects in the resulting models, in which tails of the original content distribution disappear. We refer to this effect as 'model collapse' and show that it can occur in LLMs as well as in variational autoencoders (VAEs) and…

Citation impact

547

total citations

FWCI: 171.42
Percentile: 100%
References: 9

Citations per year

Authors

6

Topics & keywords

Topics

Keywords

Generative grammar
Generative model
Intuition
Computer science
Variety (cybernetics)
The Internet
Artificial intelligence
Data science

UN Sustainable Development Goals

Reduced inequalities

No related works found for this paper.