🤖 AI Summary
This paper identifies the fundamental mechanism behind performance degradation in federated optimization under data heterogeneity: discrepancies among clients’ local optima elevate the lower bound of the global objective function, rendering perfect global fit infeasible and causing the global model to converge to an oscillatory region rather than a fixed point.
Method: Grounded in distributed optimization theory, we establish the first rigorous analytical link between local optimal divergence and global convergence behavior. Our approach integrates theoretical derivation with empirical validation across diverse tasks and model architectures, and we open-source a unified framework, FedTorch.
Contribution/Results: We provide a verifiable theoretical explanation for federated learning’s performance degradation. We prove—both theoretically and empirically—that global models cannot perfectly fit all client data under heterogeneity, and that convergence oscillation is an inherent, provable phenomenon. This work offers a novel theoretical perspective and a testable foundation for federated optimization.
📝 Abstract
Federated optimization is a constrained form of distributed optimization that enables training a global model without directly sharing client data. Although existing algorithms can guarantee convergence in theory and often achieve stable training in practice, the reasons behind performance degradation under data heterogeneity remain unclear. To address this gap, the main contribution of this paper is to provide a theoretical perspective that explains why such degradation occurs. We introduce the assumption that heterogeneous client data lead to distinct local optima, and show that this assumption implies two key consequences: 1) the distance among clients' local optima raises the lower bound of the global objective, making perfect fitting of all client data impossible; and 2) in the final training stage, the global model oscillates within a region instead of converging to a single optimum, limiting its ability to fully fit the data. These results provide a principled explanation for performance degradation in non-iid settings, which we further validate through experiments across multiple tasks and neural network architectures. The framework used in this paper is open-sourced at: https://github.com/NPCLEI/fedtorch.