Reconciling Communication Compression and Byzantine-Robustness in Distributed Learning

📅 2025-08-23

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

Addressing the challenge of jointly achieving communication compression and Byzantine robustness in distributed learning, this paper proposes RoSDHB—a novel decentralized stochastic optimization algorithm. RoSDHB is the first to integrate Polyak momentum with a collaborative compression mechanism, requiring only Lipschitz smoothness of honest-node loss functions—significantly weakening the prior strong assumption of bounded global Hessian variance. It introduces an interference-resilient coordinated compression strategy that effectively mitigates compression noise interference with robust aggregation. Convergence is rigorously established under the (G, B)-gradient heterogeneity model. Experiments on standard image classification benchmarks demonstrate that RoSDHB achieves substantial communication compression while matching the Byzantine robustness performance of the state-of-the-art algorithm Byz-DASHA-PAGE.

Technology Category

Application Category

📝 Abstract

Distributed learning (DL) enables scalable model training over decentralized data, but remains challenged by Byzantine faults and high communication costs. While both issues have been studied extensively in isolation, their interaction is less explored. Prior work shows that naively combining communication compression with Byzantine-robust aggregation degrades resilience to faulty nodes (or workers). The state-of-the-art algorithm, namely Byz-DASHA-PAGE [29], makes use of the momentum variance reduction scheme to mitigate the detrimental impact of compression noise on Byzantine-robustness. We propose a new algorithm, named RoSDHB, that integrates the classic Polyak's momentum with a new coordinated compression mechanism. We show that RoSDHB performs comparably to Byz-DASHA-PAGE under the standard (G, B)-gradient dissimilarity heterogeneity model, while it relies on fewer assumptions. In particular, we only assume Lipschitz smoothness of the average loss function of the honest workers, in contrast to [29]that additionally assumes a special smoothness of bounded global Hessian variance. Empirical results on benchmark image classification task show that RoSDHB achieves strong robustness with significant communication savings.

Problem

Research questions and friction points this paper is trying to address.

Addressing communication compression and Byzantine faults in distributed learning

Mitigating compression noise impact on Byzantine-robust aggregation

Reducing algorithm assumptions while maintaining robustness and efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates Polyak's momentum technique

Uses coordinated compression mechanism

Requires fewer assumptions than prior

🔎 Similar Papers

LoCoDL: Communication-Efficient Distributed Learning with Local Training and Compression