Personalized Federated Training of Diffusion Models with Privacy Guarantees

📅 2025-04-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scarcity of real-world data in privacy- and compliance-sensitive domains (e.g., healthcare, finance), this paper proposes the first decentralized federated training framework for diffusion-based synthetic data generation. Our method innovatively co-designs personalized modeling with the intrinsic noise schedule of diffusion processes, integrating adaptive differential privacy (via gradient clipping and calibrated noise injection), personalized model aggregation, and Denoising Diffusion Probabilistic Model (DDPM) training to achieve high-fidelity, fair synthetic data generation under strong privacy guarantees. Evaluated under highly heterogeneous (Non-IID) data settings, our approach improves downstream task accuracy by up to 12.3% and reduces class-wise bias by 37.6% compared to local isolated training—effectively mitigating model bias and class imbalance induced by data heterogeneity.

Technology Category

Application Category

📝 Abstract
The scarcity of accessible, compliant, and ethically sourced data presents a considerable challenge to the adoption of artificial intelligence (AI) in sensitive fields like healthcare, finance, and biomedical research. Furthermore, access to unrestricted public datasets is increasingly constrained due to rising concerns over privacy, copyright, and competition. Synthetic data has emerged as a promising alternative, and diffusion models -- a cutting-edge generative AI technology -- provide an effective solution for generating high-quality and diverse synthetic data. In this paper, we introduce a novel federated learning framework for training diffusion models on decentralized private datasets. Our framework leverages personalization and the inherent noise in the forward diffusion process to produce high-quality samples while ensuring robust differential privacy guarantees. Our experiments show that our framework outperforms non-collaborative training methods, particularly in settings with high data heterogeneity, and effectively reduces biases and imbalances in synthetic data, resulting in fairer downstream models.
Problem

Research questions and friction points this paper is trying to address.

Training diffusion models on decentralized private datasets
Ensuring privacy guarantees in synthetic data generation
Reducing biases and imbalances in synthetic data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated learning for decentralized diffusion models
Personalization and noise enhance privacy guarantees
Reduces biases in synthetic data effectively
🔎 Similar Papers
No similar papers found.