Explaining neural scaling laws

Bahri, Yasaman; Dyer, Ethan; Kaplan, Jared; Lee, Jaehoon; Sharma, Utkarsh

doi:10.1073/pnas.2311878121

articleProceedings of the National Academy of SciencesJun 24, 2024HYBRID OA

Explaining neural scaling laws

YBYasaman Bahri EDEthan Dyer JKJared KaplanJLJaehoon LeeUSUtkarsh Sharma

Google (United States) · Johns Hopkins University

PubMed

Indexed inarxivcrossrefpubmed

Abstract

The population loss of trained deep neural networks often follows precise power-law scaling relations with either the size of the training dataset or the number of parameters in the network. We propose a theory that explains the origins of and connects these scaling laws. We identify variance-limited and resolution-limited scaling behavior for both dataset and model size, for a total of four scaling regimes. The variance-limited scaling follows simply from the existence of a well-behaved infinite data or infinite width limit, while the resolution-limited regime can be explained by positing that models are effectively resolving a smooth data manifold. In the large width limit, this can be equivalently obtained…

Citation impact

118

total citations

FWCI: 31.78
Percentile: 100%
References: 105

Citations per year

Authors

5

YB
Yasaman BahriCorresponding
Google (United States)
ED
Ethan DyerCorresponding
Google (United States)
JK
Jared KaplanCorresponding
Johns Hopkins University
JL
Jaehoon LeeCorresponding
Google (United States)
US
Utkarsh SharmaCorresponding
Johns Hopkins University

Topics & keywords

Topics

Keywords

Scaling
Statistical physics
Range (aeronautics)
Variance (accounting)
Limit (mathematics)
Computer science
Scaling law
Power law

UN Sustainable Development Goals

Peace, Justice and strong institutions

No related works found for this paper.