🤖 AI Summary
In heterogeneous wireless environments, over-the-air federated learning (OTA-FL) suffers from slow convergence, high gradient variance, and bottlenecking by the weakest link due to non-convex loss landscapes.
Method: This paper proposes a stochastic gradient update mechanism that permits controllable structured bias, jointly optimized with power control for robust over-the-air aggregation.
Contribution/Results: We establish, for the first time, a finite-time stationarity bound for non-convex OTA-FL under statistical channel state information (CSI) uncertainty and heterogeneous path loss—thereby theoretically characterizing the bias–variance trade-off. We further design an SCA-based, CSI-dependent power allocation algorithm to ensure stable and efficient aggregation. Experiments on non-convex image classification tasks demonstrate that our method significantly accelerates convergence and improves generalization accuracy, outperforming existing OTA-FL baselines.
📝 Abstract
Over-the-air (OTA) federated learning (FL) has been well recognized as a scalable paradigm that exploits the waveform superposition of the wireless multiple-access channel to aggregate model updates in a single use. Existing OTA-FL designs largely enforce zero-bias model updates by either assuming emph{homogeneous} wireless conditions (equal path loss across devices) or forcing zero-bias updates to guarantee convergence. Under emph{heterogeneous} wireless scenarios, however, such designs are constrained by the weakest device and inflate the update variance. Moreover, prior analyses of biased OTA-FL largely address convex objectives, while most modern AI models are highly non-convex. Motivated by these gaps, we study OTA-FL with stochastic gradient descent (SGD) for general smooth non-convex objectives under wireless heterogeneity. We develop novel OTA-FL SGD updates that allow a structured, time-invariant model bias while facilitating reduced variance updates. We derive a finite-time stationarity bound (expected time average squared gradient norm) that explicitly reveals a bias-variance trade-off. To optimize this trade-off, we pose a non-convex joint OTA power-control design and develop an efficient successive convex approximation (SCA) algorithm that requires only statistical CSI at the base station. Experiments on a non-convex image classification task validate the approach: the SCA-based design accelerates convergence via an optimized bias and improves generalization over prior OTA-FL baselines.