Non-Convex Over-the-Air Heterogeneous Federated Learning: A Bias-Variance Trade-off

📅 2025-10-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In heterogeneous wireless environments, over-the-air federated learning (OTA-FL) suffers from slow convergence, high gradient variance, and bottlenecking by the weakest link due to non-convex loss landscapes. Method: This paper proposes a stochastic gradient update mechanism that permits controllable structured bias, jointly optimized with power control for robust over-the-air aggregation. Contribution/Results: We establish, for the first time, a finite-time stationarity bound for non-convex OTA-FL under statistical channel state information (CSI) uncertainty and heterogeneous path loss—thereby theoretically characterizing the bias–variance trade-off. We further design an SCA-based, CSI-dependent power allocation algorithm to ensure stable and efficient aggregation. Experiments on non-convex image classification tasks demonstrate that our method significantly accelerates convergence and improves generalization accuracy, outperforming existing OTA-FL baselines.

Technology Category

Application Category

📝 Abstract
Over-the-air (OTA) federated learning (FL) has been well recognized as a scalable paradigm that exploits the waveform superposition of the wireless multiple-access channel to aggregate model updates in a single use. Existing OTA-FL designs largely enforce zero-bias model updates by either assuming emph{homogeneous} wireless conditions (equal path loss across devices) or forcing zero-bias updates to guarantee convergence. Under emph{heterogeneous} wireless scenarios, however, such designs are constrained by the weakest device and inflate the update variance. Moreover, prior analyses of biased OTA-FL largely address convex objectives, while most modern AI models are highly non-convex. Motivated by these gaps, we study OTA-FL with stochastic gradient descent (SGD) for general smooth non-convex objectives under wireless heterogeneity. We develop novel OTA-FL SGD updates that allow a structured, time-invariant model bias while facilitating reduced variance updates. We derive a finite-time stationarity bound (expected time average squared gradient norm) that explicitly reveals a bias-variance trade-off. To optimize this trade-off, we pose a non-convex joint OTA power-control design and develop an efficient successive convex approximation (SCA) algorithm that requires only statistical CSI at the base station. Experiments on a non-convex image classification task validate the approach: the SCA-based design accelerates convergence via an optimized bias and improves generalization over prior OTA-FL baselines.
Problem

Research questions and friction points this paper is trying to address.

Addresses non-convex federated learning under heterogeneous wireless conditions
Develops biased OTA-FL updates to optimize bias-variance trade-off
Proposes joint power-control design for improved convergence and generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Structured time-invariant bias for non-convex objectives
Optimized bias-variance trade-off via power control
Successive convex approximation using statistical CSI
🔎 Similar Papers
M
Muhammad Faraz Ul Abrar
School of Electrical, Computer and Energy Engineering, Arizona State University
Nicolò Michelusi
Nicolò Michelusi
Arizona State University
5Gmm-wavewireless communications