FNBench: Benchmarking Robust Federated Learning against Noisy Labels

📅 2025-05-10

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

In federated learning (FL), severe and heterogeneous label noise across clients critically undermines model robustness, yet no unified, systematic benchmark or effective noise-resilient methods exist. To address this, we introduce FNBench—the first comprehensive benchmark for evaluating FL robustness under label noise—covering three realistic noise patterns: synthetic noise, human annotation errors, and systematic biases, uniformly assessed across six cross-modal datasets and 18 state-of-the-art FL methods. We establish the first reproducible noise modeling paradigm for FL and propose Representation-Aware Regularization (RAR), a novel technique that enhances feature discriminability and noise resilience. Furthermore, we uncover the intrinsic mechanisms by which label noise degrades FL performance: through gradient direction distortion and representation space degeneration. Experiments demonstrate that RAR consistently improves generalization across diverse FL algorithms. All code, noise configurations, and evaluation toolchains are publicly released.

Technology Category

Application Category

📝 Abstract

Robustness to label noise within data is a significant challenge in federated learning (FL). From the data-centric perspective, the data quality of distributed datasets can not be guaranteed since annotations of different clients contain complicated label noise of varying degrees, which causes the performance degradation. There have been some early attempts to tackle noisy labels in FL. However, there exists a lack of benchmark studies on comprehensively evaluating their practical performance under unified settings. To this end, we propose the first benchmark study FNBench to provide an experimental investigation which considers three diverse label noise patterns covering synthetic label noise, imperfect human-annotation errors and systematic errors. Our evaluation incorporates eighteen state-of-the-art methods over five image recognition datasets and one text classification dataset. Meanwhile, we provide observations to understand why noisy labels impair FL, and additionally exploit a representation-aware regularization method to enhance the robustness of existing methods against noisy labels based on our observations. Finally, we discuss the limitations of this work and propose three-fold future directions. To facilitate related communities, our source code is open-sourced at https://github.com/Sprinter1999/FNBench.

Problem

Research questions and friction points this paper is trying to address.

Benchmarking FL robustness against diverse label noise patterns

Evaluating 18 methods on image and text datasets

Proposing regularization to enhance FL noise resilience

Innovation

Methods, ideas, or system contributions that make the work stand out.

FNBench benchmarks FL robustness to noisy labels

Evaluates 18 methods across diverse noise patterns

Proposes representation-aware regularization for robustness

🔎 Similar Papers

FedNoisy: Federated Noisy Label Learning Benchmark