Stratify: Rethinking Federated Learning for Non-IID Data through Balanced Sampling

📅 2025-04-18

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

Addressing fundamental challenges in federated learning under non-IID data—namely, severe model bias, slow convergence, and imbalances in both class and feature distributions—this paper proposes the Stratified Label Scheduling (SLS) framework. SLS introduces a novel label-aware hierarchical sampling mechanism that jointly optimizes global label distribution controllability and client participation precision. It further incorporates a high-frequency fine-grained model update strategy and a privacy-preserving statistical protocol based on homomorphic encryption, balancing computational efficiency and cryptographic security. Evaluated across seven benchmarks—from MNIST to Tiny-ImageNet—SLS achieves performance on par with the IID setting while significantly accelerating convergence. Moreover, it reduces per-client computational overhead below that of state-of-the-art methods, without compromising privacy guarantees.

Technology Category

Application Category

📝 Abstract

Federated Learning (FL) on non-independently and identically distributed (non-IID) data remains a critical challenge, as existing approaches struggle with severe data heterogeneity. Current methods primarily address symptoms of non-IID by applying incremental adjustments to Federated Averaging (FedAvg), rather than directly resolving its inherent design limitations. Consequently, performance significantly deteriorates under highly heterogeneous conditions, as the fundamental issue of imbalanced exposure to diverse class and feature distributions remains unresolved. This paper introduces Stratify, a novel FL framework designed to systematically manage class and feature distributions throughout training, effectively tackling the root cause of non-IID challenges. Inspired by classical stratified sampling, our approach employs a Stratified Label Schedule (SLS) to ensure balanced exposure across labels, significantly reducing bias and variance in aggregated gradients. Complementing SLS, we propose a label-aware client selection strategy, restricting participation exclusively to clients possessing data relevant to scheduled labels. Additionally, Stratify incorporates a fine-grained, high-frequency update scheme, accelerating convergence and further mitigating data heterogeneity. To uphold privacy, we implement a secure client selection protocol leveraging homomorphic encryption, enabling precise global label statistics without disclosing sensitive client information. Extensive evaluations on MNIST, CIFAR-10, CIFAR-100, Tiny-ImageNet, COVTYPE, PACS, and Digits-DG demonstrate that Stratify attains performance comparable to IID baselines, accelerates convergence, and reduces client-side computation compared to state-of-the-art methods, underscoring its practical effectiveness in realistic federated learning scenarios.

Problem

Research questions and friction points this paper is trying to address.

Addresses performance decline in Federated Learning with non-IID data

Introduces Stratify to balance class and feature distributions

Ensures privacy with secure client selection and encryption

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stratified Label Schedule balances label exposure

Label-aware client selection enhances data relevance

Secure client selection uses homomorphic encryption

🔎 Similar Papers

No similar papers found.