Mixed Precision Training

Micikevicius, Paulius; Narang, Sharan; Alben, Jonah; Diamos, Gregory; Elsen, Erich; García, David; Ginsburg, Boris; Houston, Michael; Kuchaiev, Oleksii; Venkatesh, Ganesh; Wu, Hao

doi:10.48550/arxiv.1710.03740

preprintarXiv (Cornell University)Oct 10, 2017GREEN OA

Mixed Precision Training

PMPaulius Micikevicius SNSharan Narang JAJonah Alben GDGregory Diamos EEErich Elsen

Indexed inarxivdatacite

Abstract

Deep neural networks have enabled progress in a wide variety of applications. Growing the size of the neural network typically results in improved accuracy. As model sizes grow, the memory and compute requirements for training these models also increases. We introduce a technique to train deep neural networks using half precision floating point numbers. In our technique, weights, activations and gradients are stored in IEEE half-precision format. Half-precision floating numbers have limited numerical range compared to single-precision numbers. We propose two techniques to handle this loss of information. Firstly, we recommend maintaining a single-precision copy of the weights that accumulates the gradients…

Citation impact

877

total citations

FWCI: —
Percentile: —
References: 27

Citations per year

Authors

11

Topics & keywords

Topics

Keywords

Computer science
Artificial neural network
Speedup
Single-precision floating-point format
Deep learning
Computation
Deep neural networks
Floating point

No related works found for this paper.