🤖 AI Summary
To address slow convergence in wireless federated learning caused by data heterogeneity and bandwidth constraints, this paper proposes, for the first time, a collective gradient divergence (CGD) metric jointly modeling device-level and sample-level heterogeneity. Device-level CGD is defined as the weighted earth mover’s distance (WEMD) between group-wise and global data distributions, while sampling variance constraints enable decoupled multi-objective optimization. We design a polynomial-time scheduling algorithm that, on CIFAR-10, improves classification accuracy by 4.2% and reduces device participation by 41.8%, while supporting dynamic trade-offs between WEMD and sampling variance. The core innovation lies in redefining gradient divergence from a population-level collaborative perspective—departing from conventional single-device bias modeling—thereby significantly enhancing communication efficiency and model convergence performance.
📝 Abstract
Federated learning (FL) is a promising paradigm for multiple devices to cooperatively train a model. When applied in wireless networks, two issues consistently affect the performance of FL, i.e., data heterogeneity of devices and limited bandwidth. Many papers have investigated device scheduling strategies considering the two issues. However, most of them recognize data heterogeneity as a property of individual devices. In this paper, we prove that the convergence speed of FL is affected by the sum of device-level and sample-level collective gradient divergence (CGD). The device-level CGD refers to the gradient divergence of the scheduled device group, instead of the sum of the individual device divergence. The sample-level CGD is statistically upper bounded by sampling variance, which is inversely proportional to the total number of samples scheduled for local update. To derive a tractable form of the device-level CGD, we further consider a classification problem and transform it into the weighted earth moving distance (WEMD) between the group distribution and the global distribution. Then we propose FedCGD algorithm to minimize the sum of multi-level CGDs by balancing WEMD and sampling variance, within polynomial time. Simulation shows that the proposed strategy increases classification accuracy on the CIFAR-10 dataset by up to 4.2% while scheduling 41.8% fewer devices, and flexibly switches between reducing WEMD and reducing sampling variance.