π€ AI Summary
This work addresses the high communication overhead and slow convergence in heterogeneous distributed logistic regression. We propose and theoretically analyze the Local Gradient Descent (Local GD) algorithm. Our key contribution is the first proof that, under a large step size (Ξ· β« 1/K), Local GD achieves a convergence rate of O(1/KR)βbreaking the conventional Ξ©(1/R) bottleneck. This acceleration stems from a genuine benefit of increasing the number K of local updates, overturning the standard analytical paradigm requiring Ξ· β€ 1/K. By unifying nonconvex and strongly convex analysis frameworks and explicitly modeling data heterogeneity, we establish the first rigorous convergence theory that jointly optimizes communication efficiency and statistical heterogeneity in federated learning. Our analysis yields the tightest known theoretical convergence bound for distributed logistic regression under heterogeneous data settings.
π Abstract
We analyze two variants of Local Gradient Descent applied to distributed logistic regression with heterogeneous, separable data and show convergence at the rate $O(1/KR)$ for $K$ local steps and sufficiently large $R$ communication rounds. In contrast, all existing convergence guarantees for Local GD applied to any problem are at least $Omega(1/R)$, meaning they fail to show the benefit of local updates. The key to our improved guarantee is showing progress on the logistic regression objective when using a large stepsize $eta gg 1/K$, whereas prior analysis depends on $eta leq 1/K$.