Speeding Up Distributed Machine Learning Using Codes
Korea Advanced Institute of Science and Technology · University of California, Berkeley · +2 more institutions
Abstract
Codes are widely used in many engineering applications to offer robustness against noise . In large-scale systems, there are several types of noise that can affect the performance of distributed machine learning algorithms—straggler nodes, system failures, or communication bottlenecks—but there has been little interaction cutting across codes, machine learning, and distributed systems. In this paper, we provide theoretical insights on how coded solutions can achieve significant gains compared with uncoded ones. We focus on two of the most basic building blocks of distributed learning algorithms: matrix multiplication and data shuffling . For matrix multiplication, we use codes to alleviate the effect of…
Citation impact
- FWCI
- 71.71
- Percentile
- 100%
- References
- 141
Authors
5Topics & keywords
- Shuffling
- Computer science
- Matrix multiplication
- Cache
- Multicast
- Distributed data store
- Multiplication (music)
- Distributed computing