Scaling distributed machine learning with the parameter server

Li, Mu; Andersen, David G.; Park, Jun Woo; Smola, Alexander J.; Ahmed, Amr; Josifovski, Vanja; Long, James E.; Shekita, Eugene J.; Su, Bor-Yiing

doi:10.5555/2685048.2685095

articleOperating Systems Design and ImplementationOct 6, 2014Closed access

Scaling distributed machine learning with the parameter server

MLMu Li DGDavid G. Andersen JWJun Woo Park AJAlexander J. Smola AAAmr Ahmed

Carnegie Mellon University · Baidu (China) · +1 more institution

Abstract

We propose a parameter server framework for distributed machine learning problems. Both data and workloads are distributed over worker nodes, while the server nodes maintain globally shared parameters, represented as dense or sparse vectors and matrices. The framework manages asynchronous data communication between nodes, and supports flexible consistency models, elastic scalability, and continuous fault tolerance.To demonstrate the scalability of the proposed framework, we show experimental results on petabytes of real data with billions of examples and parameters on problems ranging from Sparse Logistic Regression to Latent Dirichlet Allocation and Distributed Sketching.

Citation impact

1,101

total citations

FWCI: 49.80
Percentile: 100%
References: 36

Citations per year

Authors

9

Topics & keywords

Topics

Keywords

Computer science
Scalability
Distributed computing
Asynchronous communication
Fault tolerance
Server
Petabyte
Consistency (knowledge bases)

UN Sustainable Development Goals

Decent work and economic growth

No related works found for this paper.