A Hierarchical Gradient Tracking Algorithm for Mitigating Subnet-Drift in Fog Learning Networks

📅 2024-09-25

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

To address the performance degradation of semi-decentralized federated learning (SD-FL) in fog learning under data heterogeneity—caused by its reliance on the gradient diversity assumption—this paper proposes Semi-Decentralized Gradient Tracking (SD-GT). SD-GT introduces gradient tracking terms in both device-to-device (D2D) and device-to-server communication layers, thereby eliminating the gradient diversity assumption for the first time within the SD-FL framework. We provide rigorous convergence upper bounds for non-convex, convex, and strongly convex objectives. Furthermore, we design an adaptive communication scheduling algorithm that jointly optimizes subnet sampling rate and D2D aggregation rounds. Experiments across multiple benchmark datasets demonstrate that SD-GT significantly improves model accuracy while reducing communication overhead, consistently outperforming state-of-the-art SD-FL and gradient-tracking baselines.

Technology Category

Application Category

📝 Abstract

Federated learning (FL) encounters scalability challenges when implemented over fog networks that do not follow FL's conventional star topology architecture. Semi-decentralized FL (SD-FL) has proposed a solution for device-to-device (D2D) enabled networks that divides model cooperation into two stages: at the lower stage, D2D communications is employed for local model aggregations within subnetworks (subnets), while the upper stage handles device-server (DS) communications for global model aggregations. However, existing SD-FL schemes are based on gradient diversity assumptions that become performance bottlenecks as data distributions become more heterogeneous. In this work, we develop semi-decentralized gradient tracking (SD-GT), the first SD-FL methodology that removes the need for such assumptions by incorporating tracking terms into device updates for each communication layer. Our analytical characterization of SD-GT reveals upper bounds on convergence for non-convex, convex, and strongly-convex problems. We show how the bounds enable the development of an optimization algorithm that navigates the performance-efficiency trade-off by tuning subnet sampling rate and D2D rounds for each global training interval. Our subsequent numerical evaluations demonstrate that SD-GT obtains substantial improvements in trained model quality and communication cost relative to baselines in SD-FL and gradient tracking on several datasets.

Problem

Research questions and friction points this paper is trying to address.

Addresses subnet-drift in fog learning networks with hierarchical gradient tracking

Removes gradient diversity assumptions for heterogeneous data distributions

Optimizes performance-efficiency trade-off through subnet sampling and D2D tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical gradient tracking for subnet-drift mitigation

Device updates with tracking terms per communication layer

Optimized subnet sampling and D2D rounds tuning

🔎 Similar Papers

No similar papers found.