🤖 AI Summary
In federated learning, label noise exacerbates model degradation due to data heterogeneity and client isolation, yet no standardized benchmark exists for systematic evaluation. Method: We introduce FedNoisy—the first standardized benchmark for federated learning with noisy labels—encompassing six datasets and twenty representative heterogeneous noise scenarios, including symmetric/asymmetric noise, client-specific noise rates, multi-granularity partitions, and label-flipping simulations, supported by a unified noise-simulation pipeline. We release an open-source PyTorch framework with implementations of nine baseline algorithms. Contribution/Results: Extensive experiments reveal that state-of-the-art label-noise robust methods suffer significant performance degradation under federation. Crucially, we provide the first empirical evidence that heterogeneity in noise distribution—not just data distribution—is a primary driver of aggregation bias. FedNoisy establishes a reproducible foundation for developing and evaluating robust federated learning algorithms.
📝 Abstract
Federated learning has gained popularity for distributed learning without aggregating sensitive data from clients. But meanwhile, the distributed and isolated nature of data isolation may be complicated by data quality, making it more vulnerable to noisy labels. Many efforts exist to defend against the negative impacts of noisy labels in centralized or federated settings. However, there is a lack of a benchmark that comprehensively considers the impact of noisy labels in a wide variety of typical FL settings. In this work, we serve the first standardized benchmark that can help researchers fully explore potential federated noisy settings. Also, we conduct comprehensive experiments to explore the characteristics of these data settings and the comparison across baselines, which may guide method development in the future. We highlight the 20 basic settings for 6 datasets proposed in our benchmark and standardized simulation pipeline for federated noisy label learning, including implementations of 9 baselines. We hope this benchmark can facilitate idea verification in federated learning with noisy labels. exttt{FedNoisy} is available at codeword{https://github.com/SMILELab-FL/FedNoisy}.