On the Convergence of FedAvg on Non-IID Data
Peking University · Stevens Institute of Technology
Abstract
Federated learning enables a large amount of edge computing devices to jointly learn a model without data sharing. As a leading algorithm in this setting, Federated Averaging (\texttt{FedAvg}) runs Stochastic Gradient Descent (SGD) in parallel on a small subset of the total devices and averages the sequences only once in a while. Despite its simplicity, it lacks theoretical guarantees under realistic settings. In this paper, we analyze the convergence of \texttt{FedAvg} on non-iid data and establish a convergence rate of $\mathcal{O}(\frac{1}{T})$ for strongly convex and smooth problems, where $T$ is the number of SGDs. Importantly, our bound demonstrates a trade-off between communication-efficiency and…
Citation impact
- FWCI
- —
- Percentile
- —
- References
- 48
Authors
5Topics & keywords
- Convergence (economics)
- Rate of convergence
- Computer science
- Regular polygon
- Stochastic gradient descent
- Gradient descent
- Enhanced Data Rates for GSM Evolution
- Binary logarithm