preprintarXiv (Cornell University)Dec 5, 2017GREEN OA

Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training

Tsinghua University · Stanford University

Indexed inarxivdatacite

Abstract

Large-scale distributed training requires significant communication bandwidth for gradient exchange that limits the scalability of multi-node training, and requires expensive high-bandwidth network infrastructure. The situation gets even worse with distributed training on mobile devices (federated learning), which suffers from higher latency, lower throughput, and intermittent poor connections. In this paper, we find 99.9% of the gradient exchange in distributed SGD is redundant, and propose Deep Gradient Compression (DGC) to greatly reduce the communication bandwidth. To preserve accuracy during compression, DGC employs four methods: momentum correction, local gradient clipping, momentum factor masking, and…

Citation impact

646
total citations
FWCI
Percentile
References
37
Citations per year

Authors

5

Topics & keywords

Keywords
  • Computer science
  • Scalability
  • Bandwidth (computing)
  • Deep learning
  • Stochastic gradient descent
  • Compression ratio
  • Artificial intelligence
  • Computer engineering
UN Sustainable Development Goals
  • Industry, innovation and infrastructure
No related works found for this paper.