🤖 AI Summary
This work addresses the lack of rigorous convergence theory for sequential federated learning (SFL) under data heterogeneity. We establish the first tight convergence bounds—matching upper and lower bounds—for strongly convex, generally convex, and nonconvex objective functions. By integrating optimization analysis, stochastic algorithm theory, and constructive lower-bound techniques, we prove that SFL achieves faster convergence than parallel federated learning (PFL) under high data heterogeneity, challenging the conventional intuition that “parallel is superior.” Our theoretical analysis yields tight upper–lower bound matching for both strongly convex and generally convex settings, and extensive empirical evaluations fully validate the theoretical findings. This work fills a fundamental gap in the convergence theory of SFL and provides a rigorous foundation for algorithm design and performance evaluation in heterogeneous federated learning.
📝 Abstract
There are two paradigms in Federated Learning (FL): parallel FL (PFL), where models are trained in a parallel manner across clients, and sequential FL (SFL), where models are trained in a sequential manner across clients. Specifically, in PFL, clients perform local updates independently and send the updated model parameters to a global server for aggregation; in SFL, one client starts its local updates only after receiving the model parameters from the previous client in the sequence. In contrast to that of PFL, the convergence theory of SFL on heterogeneous data is still lacking. To resolve the theoretical dilemma of SFL, we establish sharp convergence guarantees for SFL on heterogeneous data with both upper and lower bounds. Specifically, we derive the upper bounds for the strongly convex, general convex and non-convex objective functions, and construct the matching lower bounds for the strongly convex and general convex objective functions. Then, we compare the upper bounds of SFL with those of PFL, showing that SFL outperforms PFL on heterogeneous data (at least, when the level of heterogeneity is relatively high). Experimental results validate the counterintuitive theoretical finding.