🤖 AI Summary
In federated learning, partial client participation induces a mismatch between the availability distribution $q$ and the importance distribution $p$, causing bias and unstable convergence in FedAvg. To address this, we propose FedAVOT—the first framework to incorporate optimal transport (OT) into federated aggregation. FedAVOT employs masked OT to align $p$ and $q$, and leverages Sinkhorn scaling for efficient computation of transport weights. Theoretically, under nonsmooth convex settings, FedAVOT achieves a convergence rate independent of the number of participating clients, with provable guarantees even in extreme sparsity scenarios involving as few as two clients. Empirically, FedAVOT significantly outperforms FedAvg on highly heterogeneous, low-availability, and fairness-sensitive tasks—demonstrating markedly improved training stability and generalization performance.
📝 Abstract
Federated Learning (FL) allows distributed model training without sharing raw data, but suffers when client participation is partial. In practice, the distribution of available users (emph{availability distribution} $q$) rarely aligns with the distribution defining the optimization objective (emph{importance distribution} $p$), leading to biased and unstable updates under classical FedAvg. We propose extbf{Fereated AVerage with Optimal Transport ( extbf{FedAVOT})}, which formulates aggregation as a masked optimal transport problem aligning $q$ and $p$. Using Sinkhorn scaling, extbf{FedAVOT} computes transport-based aggregation weights with provable convergence guarantees. extbf{FedAVOT} achieves a standard $mathcal{O}(1/sqrt{T})$ rate under a nonsmooth convex FL setting, independent of the number of participating users per round. Our experiments confirm drastically improved performance compared to FedAvg across heterogeneous, fairness-sensitive, and low-availability regimes, even when only two clients participate per round.