Convex and Non-convex Federated Learning with Stale Stochastic Gradients: Diminishing Step Size is All You Need

📅 2026-03-03

📈 Citations: 0

✨ Influential: 0

career value

238K/year

🤖 AI Summary

This work addresses the challenge of stochastic gradient optimization in federated learning under delayed and biased gradient information. The authors propose a distributed stochastic optimization framework that enables multiple local agents to collaboratively minimize a global objective despite the presence of delays and biased gradient estimates. Their approach employs a delayed stochastic gradient descent (SGD) algorithm with a predetermined decaying stepsize schedule, deliberately avoiding delay-adaptive mechanisms. Theoretical analysis demonstrates that this strategy achieves the optimal SGD convergence rates in both non-convex and strongly convex settings, thereby establishing the efficacy and superiority of fixed decaying stepsizes in delayed environments.

Technology Category

Application Category

📝 Abstract

We propose a general framework for distributed stochastic optimization under delayed gradient models. In this setting, $n$ local agents leverage their own data and computation to assist a central server in minimizing a global objective composed of agents' local cost functions. Each agent is allowed to transmit stochastic-potentially biased and delayed-estimates of its local gradient. While a prior work has advocated delay-adaptive step sizes for stochastic gradient descent (SGD) in the presence of delays, we demonstrate that a pre-chosen diminishing step size is sufficient and matches the performance of the adaptive scheme. Moreover, our analysis establishes that diminishing step sizes recover the optimal SGD rates for nonconvex and strongly convex objectives.

Problem

Research questions and friction points this paper is trying to address.

Federated Learning

Stale Gradients

Diminishing Step Size

Non-convex Optimization

Distributed Stochastic Optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

federated learning

delayed gradients

diminishing step size