🤖 AI Summary
In edge federated learning (FL), non-IID data causes over-correction and convergence instability when applying uniform gradient correction across clients. To address this, we propose a fine-grained, client-adaptive gradient correction and aggregation mechanism. Our work is the first to formally identify and systematically mitigate over-correction induced by fixed global correction coefficients in FL. We design a lightweight, communication-free client-specific correction framework that requires no additional bandwidth overhead. Furthermore, we establish the first convergence analysis for FL explicitly targeting the over-correction problem, providing rigorous theoretical guarantees. Extensive experiments on multiple benchmark datasets demonstrate that our method significantly improves model accuracy and convergence stability, reduces communication rounds by 15%–32%, and achieves superior real-time efficiency compared to FedAvg and SCAFFOLD. Theoretical findings are strongly corroborated by empirical results.
📝 Abstract
Non-independent and identically distributed (Non-IID) data across edge clients have long posed significant challenges to federated learning (FL) training in edge computing environments. Prior works have proposed various methods to mitigate this statistical heterogeneity. While these works can achieve good theoretical performance, in this work we provide the first investigation into a hidden over-correction phenomenon brought by the uniform model correction coefficients across clients adopted by existing methods. Such over-correction could degrade model performance and even cause failures in model convergence. To address this, we propose TACO, a novel algorithm that addresses the non-IID nature of clients' data by implementing fine-grained, client-specific gradient correction and model aggregation, steering local models towards a more accurate global optimum. Moreover, we verify that leading FL algorithms generally have better model accuracy in terms of communication rounds rather than wall-clock time, resulting from their extra computation overhead imposed on clients. To enhance the training efficiency, TACO deploys a lightweight model correction and tailored aggregation approach that requires minimum computation overhead and no extra information beyond the synchronized model parameters. To validate TACO's effectiveness, we present the first FL convergence analysis that reveals the root cause of over-correction. Extensive experiments across various datasets confirm TACO's superior and stable performance in practice.