From Inexact Gradients to Byzantine Robustness: Acceleration and Optimization under Similarity

📅 2026-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the vulnerability of standard federated learning algorithms to Byzantine nodes, which can cause catastrophic failure. It presents the first unified framework modeling Byzantine-robust distributed optimization as an inexact gradient method subject to both additive and multiplicative errors. Within this framework, two novel algorithms are proposed: one leveraging Nesterov acceleration and the other integrating an optimization similarity assumption with a robust aggregation mechanism. Theoretical analysis establishes that the proposed methods achieve optimal asymptotic error bounds under Byzantine attacks. Experimental results demonstrate that these algorithms significantly reduce the number of communication rounds required for convergence and outperform existing approaches in both robustness and efficiency.

Technology Category

Application Category

📝 Abstract
Standard federated learning algorithms are vulnerable to adversarial nodes, a.k.a. Byzantine failures. To solve this issue, robust distributed learning algorithms have been developed, which typically replace parameter averaging by robust aggregations. While generic conditions on these aggregations exist to guarantee the convergence of (Stochastic) Gradient Descent (SGD), the analyses remain rather ad-hoc. This hinders the development of more complex robust algorithms, such as accelerated ones. In this work, we show that Byzantine-robust distributed optimization can, under standard generic assumptions, be cast as a general optimization with inexact gradient oracles (with both additive and multiplicative error terms), an active field of research. This allows for instance to directly show that GD on top of standard robust aggregation procedures obtains optimal asymptotic error in the Byzantine setting. Going further, we propose two optimization schemes to speed up the convergence. The first one is a Nesterov-type accelerated scheme whose proof directly derives from accelerated inexact gradient results applied to our formulation. The second one hinges on Optimization under Similarity, in which the server leverages an auxiliary loss function that approximates the global loss. Both approaches allow to drastically reduce the communication complexity compared to previous methods, as we show theoretically and empirically.
Problem

Research questions and friction points this paper is trying to address.

Byzantine robustness
federated learning
distributed optimization
inexact gradients
robust aggregation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Byzantine robustness
inexact gradients
accelerated optimization
federated learning
optimization under similarity
🔎 Similar Papers
No similar papers found.
R
Renaud Gaucher
Centre de Mathématiques Appliquées, École polytechnique, Institut Polytechnique de Paris, Palaiseau France; Centre Inria de l’Univ. Grenoble Alpes, CNRS, LJK, Grenoble, France
Aymeric Dieuleveut
Aymeric Dieuleveut
Professor, Ecole Polytechnique, France
StatisticsOptimisationMachine Learning
Hadrien Hendrikx
Hadrien Hendrikx
INRIA Grenoble - LJK - UGA
OptimizationDecentralized systemsGossip algorithms