Achieving Linear Speedup with ProxSkip in Distributed Stochastic Optimization

📅 2023-10-12

📈 Citations: 1

✨ Influential: 0

career value

257K/year

🤖 AI Summary

ProxSkip alleviates communication bottlenecks and accommodates data heterogeneity in distributed optimization, yet its theoretical analysis has long been restricted to strongly convex settings, and linear speedup—i.e., convergence rate scaling inversely with the number of nodes $n$—remains unproven, especially in nonconvex regimes. This work bridges these gaps: (i) it establishes the first convergence guarantees for ProxSkip under nonconvex and general convex objectives, rigorously proving linear speedup; (ii) it proposes a network-agnostic stepsize scheme that enhances robustness in strongly convex settings; and (iii) it derives optimal communication complexity—$O(psigma^2/(nvarepsilon^2))$ for nonconvex/convex cases and $O(psigma^2/(nvarepsilon))$ for strongly convex ones—demonstrating inverse proportionality between communication cost and $n$, while supporting progressive compression driven by the client participation probability.

📝 Abstract

The ProxSkip algorithm for distributed optimization is gaining increasing attention due to its proven benefits in accelerating communication complexity while maintaining robustness against data heterogeneity. However, existing analyses of ProxSkip are limited to the strongly convex setting and do not achieve linear speedup, where convergence performance increases linearly with respect to the number of nodes. So far, questions remain open about how ProxSkip behaves in the non-convex setting and whether linear speedup is achievable. In this paper, we revisit distributed ProxSkip and address both questions. We demonstrate that the leading communication complexity of ProxSkip is $mathcal{O}(frac{psigma^2}{nepsilon^2})$ for non-convex and convex settings, and $mathcal{O}(frac{psigma^2}{nepsilon})$ for the strongly convex setting, where $n$ represents the number of nodes, $p$ denotes the probability of communication, $sigma^2$ signifies the level of stochastic noise, and $epsilon$ denotes the desired accuracy level. This result illustrates that ProxSkip achieves linear speedup and can asymptotically reduce communication overhead proportional to the probability of communication. Additionally, for the strongly convex setting, we further prove that ProxSkip can achieve linear speedup with network-independent stepsizes.

Problem

Research questions and friction points this paper is trying to address.

Extending ProxSkip to non-convex optimization settings

Achieving linear speedup in distributed optimization

Reducing communication overhead proportionally with communication probability

Innovation

Methods, ideas, or system contributions that make the work stand out.

ProxSkip achieves linear speedup in optimization

Reduces communication overhead proportionally to probability

Works with network-independent stepsizes strongly convex

🔎 Similar Papers

No similar papers found.