🤖 AI Summary
To address the challenge of robust gradient aggregation in high-dimensional distributed learning under arbitrary and unknown numbers of Byzantine adversaries, this paper proposes the first aggregation method that requires no prior knowledge of the number of attackers. The core innovation is a high-dimensional semi-verified mean estimation framework, which integrates subspace decomposition with orthogonal gradient projection estimation, and employs an auxiliary dataset to enable robust mean estimation within the subspace. Theoretically, the method achieves dimension-optimal (minimax-optimal) statistical convergence rate in high dimensions, with an error bound entirely independent of dimensionality—effectively mitigating the “curse of dimensionality.” Its convergence rate strictly dominates all existing Byzantine-robust methods, and it maintains strong consistency regardless of the number of Byzantine workers.
📝 Abstract
Robust distributed learning with Byzantine failures has attracted extensive research interests in recent years. However, most of existing methods suffer from curse of dimensionality, which is increasingly serious with the growing complexity of modern machine learning models. In this paper, we design a new method that is suitable for high dimensional problems, under arbitrary number of Byzantine attackers. The core of our design is a direct high dimensional semi-verified mean estimation method. Our idea is to identify a subspace first. The components of mean value perpendicular to this subspace can be estimated via gradient vectors uploaded from worker machines, while the components within this subspace are estimated using auxiliary dataset. We then use our new method as the aggregator of distributed learning problems. Our theoretical analysis shows that the new method has minimax optimal statistical rates. In particular, the dependence on dimensionality is significantly improved compared with previous works.