Training Deep Nets with Sublinear Memory Cost

Chen, Tianqi; Xu, Bing; Zhang, Chiyuan; Guestrin, Carlos

doi:10.48550/arxiv.1604.06174

preprintarXiv (Cornell University)Apr 21, 2016GREEN OA

Training Deep Nets with Sublinear Memory Cost

TCTianqi Chen BXBing Xu CZChiyuan Zhang CGCarlos Guestrin

Indexed inarxivdatacite

Abstract

We propose a systematic approach to reduce the memory consumption of deep neural network training. Specifically, we design an algorithm that costs O(sqrt(n)) memory to train a n layer network, with only the computational cost of an extra forward pass per mini-batch. As many of the state-of-the-art models hit the upper bound of the GPU memory, our algorithm allows deeper and more complex models to be explored, and helps advance the innovations in deep learning research. We focus on reducing the memory cost to store the intermediate feature maps and gradients during training. Computation graph analysis is used for automatic in-place operation and memory sharing optimizations. We show that it is possible to trade…

Citation impact

539

total citations

FWCI: —
Percentile: —
References: 19

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Sublinear function
Computer science
Training (meteorology)
Parallel computing
Mathematics
Discrete mathematics

No related works found for this paper.