Speeding Up Distributed Machine Learning Using Codes

Lee, Kangwook; Lam, Maximilian; Pedarsani, Ramtin; Papailiopoulos, Dimitris; Ramchandran, Kannan

doi:10.1109/tit.2017.2736066

articleIEEE Transactions on Information TheoryAug 4, 2017GREEN OA

Speeding Up Distributed Machine Learning Using Codes

KLKangwook Lee MLMaximilian Lam RPRamtin Pedarsani DPDimitris Papailiopoulos KRKannan Ramchandran

Korea Advanced Institute of Science and Technology · University of California, Berkeley · +2 more institutions

Indexed inarxivcrossref

Abstract

Codes are widely used in many engineering applications to offer robustness against noise . In large-scale systems, there are several types of noise that can affect the performance of distributed machine learning algorithms—straggler nodes, system failures, or communication bottlenecks—but there has been little interaction cutting across codes, machine learning, and distributed systems. In this paper, we provide theoretical insights on how coded solutions can achieve significant gains compared with uncoded ones. We focus on two of the most basic building blocks of distributed learning algorithms: matrix multiplication and data shuffling . For matrix multiplication, we use codes to alleviate the effect of…

Citation impact

866

total citations

FWCI: 71.71
Percentile: 100%
References: 141

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Shuffling
Computer science
Matrix multiplication
Cache
Multicast
Distributed data store
Multiplication (music)
Distributed computing

No related works found for this paper.