Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training

Lin, Yujun; Han, Song; Mao, Huizi; Wang, Yu; Dally, William J.

doi:10.48550/arxiv.1712.01887

preprintarXiv (Cornell University)Dec 5, 2017GREEN OA

Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training

YLYujun Lin SHSong Han HMHuizi Mao YWYu Wang WJWilliam J. Dally

Tsinghua University · Stanford University

Indexed inarxivdatacite

Abstract

Large-scale distributed training requires significant communication bandwidth for gradient exchange that limits the scalability of multi-node training, and requires expensive high-bandwidth network infrastructure. The situation gets even worse with distributed training on mobile devices (federated learning), which suffers from higher latency, lower throughput, and intermittent poor connections. In this paper, we find 99.9% of the gradient exchange in distributed SGD is redundant, and propose Deep Gradient Compression (DGC) to greatly reduce the communication bandwidth. To preserve accuracy during compression, DGC employs four methods: momentum correction, local gradient clipping, momentum factor masking, and…

Citation impact

646

total citations

FWCI: —
Percentile: —
References: 37

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Computer science
Scalability
Bandwidth (computing)
Deep learning
Stochastic gradient descent
Compression ratio
Artificial intelligence
Computer engineering

UN Sustainable Development Goals

Industry, innovation and infrastructure

No related works found for this paper.