🤖 AI Summary
To address the performance degradation of semi-decentralized federated learning (SD-FL) in fog learning under data heterogeneity—caused by its reliance on the gradient diversity assumption—this paper proposes Semi-Decentralized Gradient Tracking (SD-GT). SD-GT introduces gradient tracking terms in both device-to-device (D2D) and device-to-server communication layers, thereby eliminating the gradient diversity assumption for the first time within the SD-FL framework. We provide rigorous convergence upper bounds for non-convex, convex, and strongly convex objectives. Furthermore, we design an adaptive communication scheduling algorithm that jointly optimizes subnet sampling rate and D2D aggregation rounds. Experiments across multiple benchmark datasets demonstrate that SD-GT significantly improves model accuracy while reducing communication overhead, consistently outperforming state-of-the-art SD-FL and gradient-tracking baselines.
📝 Abstract
Federated learning (FL) encounters scalability challenges when implemented over fog networks that do not follow FL's conventional star topology architecture. Semi-decentralized FL (SD-FL) has proposed a solution for device-to-device (D2D) enabled networks that divides model cooperation into two stages: at the lower stage, D2D communications is employed for local model aggregations within subnetworks (subnets), while the upper stage handles device-server (DS) communications for global model aggregations. However, existing SD-FL schemes are based on gradient diversity assumptions that become performance bottlenecks as data distributions become more heterogeneous. In this work, we develop semi-decentralized gradient tracking (SD-GT), the first SD-FL methodology that removes the need for such assumptions by incorporating tracking terms into device updates for each communication layer. Our analytical characterization of SD-GT reveals upper bounds on convergence for non-convex, convex, and strongly-convex problems. We show how the bounds enable the development of an optimization algorithm that navigates the performance-efficiency trade-off by tuning subnet sampling rate and D2D rounds for each global training interval. Our subsequent numerical evaluations demonstrate that SD-GT obtains substantial improvements in trained model quality and communication cost relative to baselines in SD-FL and gradient tracking on several datasets.