articleProceedings of the National Academy of SciencesJun 24, 2024HYBRID OA

Explaining neural scaling laws

Google (United States) · Johns Hopkins University

PubMed
Indexed inarxivcrossrefpubmed

Abstract

The population loss of trained deep neural networks often follows precise power-law scaling relations with either the size of the training dataset or the number of parameters in the network. We propose a theory that explains the origins of and connects these scaling laws. We identify variance-limited and resolution-limited scaling behavior for both dataset and model size, for a total of four scaling regimes. The variance-limited scaling follows simply from the existence of a well-behaved infinite data or infinite width limit, while the resolution-limited regime can be explained by positing that models are effectively resolving a smooth data manifold. In the large width limit, this can be equivalently obtained…

Citation impact

118
total citations
FWCI
31.78
Percentile
100%
References
105
Citations per year

Authors

5

Topics & keywords

Keywords
  • Scaling
  • Statistical physics
  • Range (aeronautics)
  • Variance (accounting)
  • Limit (mathematics)
  • Computer science
  • Scaling law
  • Power law
UN Sustainable Development Goals
  • Peace, Justice and strong institutions
No related works found for this paper.