Byzantine-Robust Distributed SGD: A Unified Analysis and Tight Error Bounds

📅 2026-04-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of a unified convergence analysis framework for Byzantine-robust distributed optimization, particularly under general data heterogeneity. It establishes a comprehensive theoretical framework encompassing both momentum and non-momentum variants of distributed stochastic gradient descent (SGD). Under non-convex smooth objectives and the Polyak–Łojasiewicz condition, the study derives tight convergence upper and lower bounds for the first time under general heterogeneity assumptions. These bounds reveal that local momentum effectively suppresses stochastic noise and precisely characterize the fundamental performance limits of Byzantine-robust learning in the presence of both stochasticity and data heterogeneity. The theoretical analysis demonstrates matching upper and lower bounds, and empirical experiments validate the efficacy of local momentum in enhancing robustness.

Technology Category

Application Category

📝 Abstract
Byzantine-robust distributed optimization relies on robust aggregation rules to mitigate the influence of malicious Byzantine workers. Despite the proliferation of such rules, a unified convergence analysis framework that accommodates general data heterogeneity is lacking. In this work, we provide a thorough convergence theory of Byzantine-robust distributed stochastic gradient descent (SGD), analyzing variants both with and without local momentum. We establish the convergence rates for nonconvex smooth objectives and those satisfying the Polyak-Lojasiewicz condition under a general data heterogeneity assumption. Our analysis reveals that while stochasticity and data heterogeneity introduce unavoidable error floors, local momentum provably reduces the error component induced by stochasticity. Furthermore, we derive matching lower bounds to demonstrate that the upper bounds obtained in our analysis are tight and characterize the fundamental limits of Byzantine resilience under stochasticity and data heterogeneity. Empirical results support our theoretical findings.
Problem

Research questions and friction points this paper is trying to address.

Byzantine-robust
distributed SGD
data heterogeneity
convergence analysis
stochastic optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Byzantine-robust optimization
distributed SGD
data heterogeneity
local momentum
tight convergence bounds
🔎 Similar Papers
No similar papers found.
B
Boyuan Ruan
School of Advanced Interdisciplinary Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
Xiaoyu Wang
Xiaoyu Wang
School of Mathematical Sciences, University of Chinese Academy of Sciences
OptimizationMachine Learning
Y
Ya-Feng Liu
Ministry of Education Key Laboratory of Mathematics and Information Networks, School of Mathematical Sciences, Beijing University of Posts and Telecommunications, Beijing 102206, China