🤖 AI Summary
This work addresses the challenge of stochastic gradient optimization in federated learning under delayed and biased gradient information. The authors propose a distributed stochastic optimization framework that enables multiple local agents to collaboratively minimize a global objective despite the presence of delays and biased gradient estimates. Their approach employs a delayed stochastic gradient descent (SGD) algorithm with a predetermined decaying stepsize schedule, deliberately avoiding delay-adaptive mechanisms. Theoretical analysis demonstrates that this strategy achieves the optimal SGD convergence rates in both non-convex and strongly convex settings, thereby establishing the efficacy and superiority of fixed decaying stepsizes in delayed environments.
📝 Abstract
We propose a general framework for distributed stochastic optimization under delayed gradient models. In this setting, $n$ local agents leverage their own data and computation to assist a central server in minimizing a global objective composed of agents' local cost functions. Each agent is allowed to transmit stochastic-potentially biased and delayed-estimates of its local gradient. While a prior work has advocated delay-adaptive step sizes for stochastic gradient descent (SGD) in the presence of delays, we demonstrate that a pre-chosen diminishing step size is sufficient and matches the performance of the adaptive scheme. Moreover, our analysis establishes that diminishing step sizes recover the optimal SGD rates for nonconvex and strongly convex objectives.