Explaining neural scaling laws
Google (United States) · Johns Hopkins University
Abstract
The population loss of trained deep neural networks often follows precise power-law scaling relations with either the size of the training dataset or the number of parameters in the network. We propose a theory that explains the origins of and connects these scaling laws. We identify variance-limited and resolution-limited scaling behavior for both dataset and model size, for a total of four scaling regimes. The variance-limited scaling follows simply from the existence of a well-behaved infinite data or infinite width limit, while the resolution-limited regime can be explained by positing that models are effectively resolving a smooth data manifold. In the large width limit, this can be equivalently obtained…
Citation impact
- FWCI
- 31.78
- Percentile
- 100%
- References
- 105
Authors
5Topics & keywords
- Scaling
- Statistical physics
- Range (aeronautics)
- Variance (accounting)
- Limit (mathematics)
- Computer science
- Scaling law
- Power law
- Peace, Justice and strong institutions