๐ค AI Summary
Severe data heterogeneity across clients in federated learning severely degrades model convergence, and existing gradient tracking (GT) methods are limited to SGD, lacking compatibility with mainstream adaptive optimizers such as Adam.
Method: We propose a novel *parameter tracking* (PT) paradigm that generalizes GT from the gradient space to the parameter spaceโenabling, for the first time, tight integration with Adam. Based on PT, we design two new federated adaptive algorithms: FAdamGT and FAdamET.
Contribution/Results: Theoretically, we provide the first rigorous convergence guarantee for adaptive federated optimization under non-convex objectives. Technically, we achieve this via distributed first-order information correction and a principled federated adaptation of Adam, balancing communication efficiency and convergence stability. Extensive experiments demonstrate that our methods significantly reduce both communication and computational overhead across diverse heterogeneity settings, consistently outperforming state-of-the-art federated SGD and adaptive baselines.
๐ Abstract
In Federated Learning (FL), model training performance is strongly impacted by data heterogeneity across clients. Gradient Tracking (GT) has recently emerged as a solution which mitigates this issue by introducing correction terms to local model updates. To date, GT has only been considered under Stochastic Gradient Descent (SGD)-based model training, while modern FL frameworks increasingly employ adaptive optimizers for improved convergence. In this work, we generalize the GT framework to a more flexible Parameter Tracking (PT) paradigm and propose two novel adaptive optimization algorithms, { t FAdamET} and { t FAdamGT}, that integrate PT into Adam-based FL. We provide a rigorous convergence analysis of these algorithms under non-convex settings. Our experimental results demonstrate that both proposed algorithms consistently outperform existing methods when evaluating total communication cost and total computation cost across varying levels of data heterogeneity, showing the effectiveness of correcting first-order information in federated adaptive optimization.