π€ AI Summary
This work addresses the high computational overhead of existing robust aggregation methods in federated learning under Byzantine attacks, which stems from processing high-dimensional gradients and hinders scalability to large models. The authors propose a Projection-based Dimensionality Reduction (PDR) framework that compresses client gradients into a low-dimensional subspace via sparse random projections, substantially reducing server-side aggregation complexity while preserving robustness. PDR is the first method to enable generic acceleration of distance-based robust aggregators at the vector level, approaching the theoretical lower bound on computation. It guarantees optimal convergence rates of $O(1/\sqrt{T})$ in non-convex settings and $O(1/T)$ under strong convexity. Experiments demonstrate that PDR achieves speedups of several orders of magnitude in aggregation time, introducing only controllable approximation error while maintaining both efficiency and convergence performance.
π Abstract
Federated Learning (FL) enables multiple clients to collaboratively train models without sharing raw data, but it is highly vulnerable to Byzantine attacks. Existing robust approaches can neutralize these threats but incur substantial computational overhead during high-dimensional gradient aggregation, an overhead that scales poorly with model size and increasingly dominates the training cost as modern models grow larger. To address this computational bottleneck, we propose Projected Dimensionality Reduction (PDR), a universal acceleration framework for vector-level distance-based robust aggregators, which performs robust aggregation by compressing gradients into a drastically smaller subspace via sparse random projection to efficiently compute reliability weights. This approach reduces the server computational complexity to an optimal $ \mathcal{O}(Mp) $, where $ M $ is the number of clients and $ p $ is the model dimension, matching the theoretical lower bound required merely to read the gradients. We establish convergence guarantees under standard FL assumptions in prior Byzantine-robust FL analyses. By leveraging the Subspace Embedding Theorem, we show that PDR achieves optimal convergence rates of $ \mathcal{O}(1/\sqrt{T}) $ for non-convex functions and $ \mathcal{O}(1/T) $ for strongly convex functions, where $ T $ denotes the number of iterations. Crucially, we mathematically demonstrate that this massive acceleration comes almost for free, merely inflating the inherent Byzantine error floor by a bounded, tunable factor of $ \frac{1+Ξ΅}{1-Ξ΅} $. Experimental results on benchmark datasets confirm that integrating PDR with existing aggregators yields orders of magnitude speedups in time efficiency while maintaining highly competitive convergence performance.