More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server

Ho, Qirong; Cipar, James; Cui, Henggang; Kim, Jin Kyu; Lee, Seunghak; Gibbons, Phillip B.; Gibson, Garth A.; Ganger, Gregory R.; Xing, Eric P.

doi:10.1184/r1/6475898

articlePubMedJan 1, 2013GREEN OA

More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server

QHQirong Ho JCJames Cipar HCHenggang Cui JKJin Kyu Kim SLSeunghak Lee

Carnegie Mellon University

PubMed

Indexed indatacitepubmed

Abstract

We propose a parameter server system for distributed ML, which follows a Stale Synchronous Parallel (SSP) model of computation that maximizes the time computational workers spend doing useful work on ML algorithms, while still providing correctness guarantees. The parameter server provides an easy-to-use shared interface for read/write access to an ML model's values (parameters and variables), and the SSP model allows distributed workers to read older, stale versions of these values from a local cache, instead of waiting to get them from a central storage. This significantly increases the proportion of time workers spend computing, as opposed to waiting. Furthermore, the SSP model ensures ML algorithm…

Citation impact

554

total citations

FWCI: 35.44
Percentile: 100%
References: 30

Citations per year

Authors

9

Topics & keywords

Topics

Keywords

Correctness
Computer science
Asynchronous communication
Parallel computing
Computation
Limiting
Cache
Distributed computing

UN Sustainable Development Goals

Decent work and economic growth

No related works found for this paper.