Byzantine-resilient Federated Learning Employing Normalized Gradients on Non-IID Datasets

📅 2024-08-18

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

🤖 AI Summary

Federated learning (FL) suffers from slow convergence and severe bias when Byzantine attacks coexist with Non-IID data. To address this, we propose Fed-NGA—a lightweight gradient normalization-based aggregation mechanism—that is the first to achieve dual adaptivity to both non-convex and strongly convex loss functions under Non-IID data, attaining zero optimization gap theoretically. Fed-NGA normalizes local gradients to unit vectors before aggregation, simultaneously ensuring Byzantine robustness and computational efficiency, with optimal time complexity of O(pM), where p is model dimension and M is the number of clients. We establish rigorous convergence guarantees: O(1/T^{1/2−δ}) for non-convex objectives and linear convergence for strongly convex ones. Extensive experiments demonstrate that Fed-NGA significantly outperforms state-of-the-art Byzantine-robust FL methods in both training speed and final model accuracy, while maintaining strong robustness against adversarial clients.

Technology Category

Application Category

📝 Abstract

In practical federated learning (FL) systems, the presence of malicious Byzantine attacks and data heterogeneity often introduces biases into the learning process. However, existing Byzantine-robust methods typically only achieve a compromise between adaptability to different loss function types (including both strongly convex and non-convex) and robustness to heterogeneous datasets, but with non-zero optimality gap. Moreover, this compromise often comes at the cost of high computational complexity for aggregation, which significantly slows down the training speed. To address this challenge, we propose a federated learning approach called Federated Normalized Gradients Algorithm (Fed-NGA). Fed-NGA simply normalizes the uploaded local gradients to be unit vectors before aggregation, achieving a time complexity of $mathcal{O}(pM)$, where $p$ represents the dimension of model parameters and $M$ is the number of participating clients. This complexity scale achieves the best level among all the existing Byzantine-robust methods. Furthermore, through rigorous proof, we demonstrate that Fed-NGA transcends the trade-off between adaptability to loss function type and data heterogeneity and the limitation of non-zero optimality gap in existing literature. Specifically, Fed-NGA can adapt to both non-convex loss functions and non-IID datasets simultaneously, with zero optimality gap at a rate of $mathcal{O} (1/T^{frac{1}{2} - delta})$, where T is the iteration number and $delta in (0,frac{1}{2})$. In cases where the loss function is strongly convex, the zero optimality gap achieving rate can be improved to be linear. Experimental results provide evidence of the superiority of our proposed Fed-NGA on time complexity and convergence performance over baseline methods.

Problem

Research questions and friction points this paper is trying to address.

Addressing Byzantine attacks and data heterogeneity in Federated Learning

Reducing computational overhead during gradient aggregation process

Achieving robust convergence under non-convex loss functions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Aggregating normalized gradients from clients

Achieving linear time complexity in aggregation

Ensuring Byzantine robustness and data heterogeneity tolerance

🔎 Similar Papers

No similar papers found.

Authors to Follow