๐ค AI Summary
This work addresses the communication bottleneck in decentralized optimization caused by frequent node synchronization. To mitigate this issue, the authors propose Overlapped Local Decentralized Stochastic Gradient Descent (OLDSGD), a novel method that introduces a computation-communication overlap mechanism into decentralized training for the first time. While maintaining the same average update frequency as Local SGD, OLDSGD effectively reduces network idle time and avoids communication stalls. Theoretical analysis shows that OLDSGD achieves the same iteration complexity as standard Local Decentralized SGD under non-convex objectives. Empirical evaluations demonstrate that OLDSGD consistently accelerates practical convergence across various communication delay scenarios.
๐ Abstract
Decentralized optimization has emerged as a critical paradigm for distributed learning, enabling scalable training while preserving data privacy through peer-to-peer collaboration. However, existing methods often suffer from communication bottlenecks due to frequent synchronization between nodes. We present Overlapping Local Decentralized SGD (OLDSGD), a novel approach to accelerate decentralized training by computation-communication overlapping, significantly reducing network idle time. With a deliberately designed update, OLDSGD preserves the same average update as Local SGD while avoiding communication-induced stalls. Theoretically, we establish non-asymptotic convergence rates for smooth non-convex objectives, showing that OLDSGD retains the same iteration complexity as standard Local Decentralized SGD while improving per-iteration runtime. Empirical results demonstrate OLDSGD's consistent improvements in wall-clock time convergence under different levels of communication delays. With minimal modifications to existing frameworks, OLDSGD offers a practical solution for faster decentralized learning without sacrificing theoretical guarantees.