Disentangling data distribution for Federated Learning

📅 2024-10-16

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

To address the challenges of high communication overhead, degraded model performance, and weakened privacy guarantees arising from heterogeneous data distributions across clients in federated learning, this paper provides the first theoretical proof that decoupling and reconstructing each client’s data distribution enables federated training to achieve the theoretical efficiency of centralized training within a single communication round. We propose FedDistr, a novel algorithm that leverages Stable Diffusion models for distribution-level decoupling and generative reconstruction, integrated within a privacy-preserving federated framework. Experiments on CIFAR-100 and DomainNet demonstrate significant improvements in accuracy and convergence speed; utility under a single round of communication closely matches that of centralized training, while strictly adhering to the “data never leaves the client” privacy constraint. Our core contribution is establishing the first theoretical linkage between distribution decoupling and communication efficiency, and realizing the first generative federated learning paradigm that simultaneously achieves provable optimality and practical effectiveness.

Technology Category

Application Category

📝 Abstract

Federated Learning (FL) facilitates collaborative training of a global model whose performance is boosted by private data owned by distributed clients, without compromising data privacy. Yet the wide applicability of FL is hindered by entanglement of data distributions across different clients. This paper demonstrates for the first time that by disentangling data distributions FL can in principle achieve efficiencies comparable to those of distributed systems, requiring only one round of communication. To this end, we propose a novel FedDistr algorithm, which employs stable diffusion models to decouple and recover data distributions. Empirical results on the CIFAR100 and DomainNet datasets show that FedDistr significantly enhances model utility and efficiency in both disentangled and near-disentangled scenarios while ensuring privacy, outperforming traditional federated learning methods.

Problem

Research questions and friction points this paper is trying to address.

Federated Learning

Data Distribution Disparity

Communication Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

FedDistr

Stable Diffusion Model

Privacy Preservation

🔎 Similar Papers

A Survey on Group Fairness in Federated Learning: Challenges, Taxonomy of Solutions and Directions for Future Research