Byzantine-Robust Optimization under $(L_0, L_1)$-Smoothness

📅 2026-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of distributed optimization under Byzantine attacks in settings where the objective function exhibits $(L_0, L_1)$-smoothness—i.e., its gradient Lipschitz constant varies with the model state. The paper proposes Byz-NSGDM, the first algorithm to incorporate $(L_0, L_1)$-smoothness into a Byzantine-robust optimization framework. Byz-NSGDM integrates normalized momentum-based stochastic gradient descent with a robust aggregation mechanism based on nearest-neighbor mixing (NNM), effectively mitigating diverse Byzantine attacks. Theoretical analysis establishes a convergence rate of $O(K^{-1/4})$, and empirical evaluations on MNIST classification, synthetic optimization tasks, and small-scale GPT language modeling demonstrate the algorithm’s strong robustness and effectiveness.

Technology Category

Application Category

📝 Abstract
We consider distributed optimization under Byzantine attacks in the presence of $(L_0,L_1)$-smoothness, a generalization of standard $L$-smoothness that captures functions with state-dependent gradient Lipschitz constants. We propose Byz-NSGDM, a normalized stochastic gradient descent method with momentum that achieves robustness against Byzantine workers while maintaining convergence guarantees. Our algorithm combines momentum normalization with Byzantine-robust aggregation enhanced by Nearest Neighbor Mixing (NNM) to handle both the challenges posed by $(L_0,L_1)$-smoothness and Byzantine adversaries. We prove that Byz-NSGDM achieves a convergence rate of $O(K^{-1/4})$ up to a Byzantine bias floor proportional to the robustness coefficient and gradient heterogeneity. Experimental validation on heterogeneous MNIST classification, synthetic $(L_0,L_1)$-smooth optimization, and character-level language modeling with a small GPT model demonstrates the effectiveness of our approach against various Byzantine attack strategies. An ablation study further shows that Byz-NSGDM is robust across a wide range of momentum and learning rate choices.
Problem

Research questions and friction points this paper is trying to address.

Byzantine-Robust Optimization
Distributed Optimization
L0-L1 Smoothness
Byzantine Attacks
Non-convex Optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Byzantine-Robust Optimization
($L_0, L_1$)-Smoothness
Normalized SGD with Momentum
Nearest Neighbor Mixing
Distributed Learning