High Dimensional Distributed Gradient Descent with Arbitrary Number of Byzantine Attackers

📅 2023-07-25

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 0

career value

246K/year

🤖 AI Summary

To address the challenge of robust gradient aggregation in high-dimensional distributed learning under arbitrary and unknown numbers of Byzantine adversaries, this paper proposes the first aggregation method that requires no prior knowledge of the number of attackers. The core innovation is a high-dimensional semi-verified mean estimation framework, which integrates subspace decomposition with orthogonal gradient projection estimation, and employs an auxiliary dataset to enable robust mean estimation within the subspace. Theoretically, the method achieves dimension-optimal (minimax-optimal) statistical convergence rate in high dimensions, with an error bound entirely independent of dimensionality—effectively mitigating the “curse of dimensionality.” Its convergence rate strictly dominates all existing Byzantine-robust methods, and it maintains strong consistency regardless of the number of Byzantine workers.

📝 Abstract

Robust distributed learning with Byzantine failures has attracted extensive research interests in recent years. However, most of existing methods suffer from curse of dimensionality, which is increasingly serious with the growing complexity of modern machine learning models. In this paper, we design a new method that is suitable for high dimensional problems, under arbitrary number of Byzantine attackers. The core of our design is a direct high dimensional semi-verified mean estimation method. Our idea is to identify a subspace first. The components of mean value perpendicular to this subspace can be estimated via gradient vectors uploaded from worker machines, while the components within this subspace are estimated using auxiliary dataset. We then use our new method as the aggregator of distributed learning problems. Our theoretical analysis shows that the new method has minimax optimal statistical rates. In particular, the dependence on dimensionality is significantly improved compared with previous works.

Problem

Research questions and friction points this paper is trying to address.

Addresses Byzantine attacks in high-dimensional distributed gradient descent systems

Overcomes dimensionality curse by combining corrupted and clean datasets

Achieves minimax optimal rates without dimension-dependent error scaling

Innovation

Methods, ideas, or system contributions that make the work stand out.

High dimensional semi-verified mean estimation method

Subspace identification with large variance components

Combining corrupted gradients with auxiliary clean dataset

🔎 Similar Papers

Adaptive Gradient Clipping for Robust Federated Learning

2024-05-23Citations: 1

A Mirror Descent-Based Algorithm for Corruption-Tolerant Distributed Gradient Descent

2024-07-19arXiv.orgCitations: 0

Bosch Group

Renningen, BW, DE

Machine Learning Engineer