Personalized Federated Training of Diffusion Models with Privacy Guarantees

📅 2025-04-01

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

To address the scarcity of real-world data in privacy- and compliance-sensitive domains (e.g., healthcare, finance), this paper proposes the first decentralized federated training framework for diffusion-based synthetic data generation. Our method innovatively co-designs personalized modeling with the intrinsic noise schedule of diffusion processes, integrating adaptive differential privacy (via gradient clipping and calibrated noise injection), personalized model aggregation, and Denoising Diffusion Probabilistic Model (DDPM) training to achieve high-fidelity, fair synthetic data generation under strong privacy guarantees. Evaluated under highly heterogeneous (Non-IID) data settings, our approach improves downstream task accuracy by up to 12.3% and reduces class-wise bias by 37.6% compared to local isolated training—effectively mitigating model bias and class imbalance induced by data heterogeneity.

Technology Category

Application Category

📝 Abstract

The scarcity of accessible, compliant, and ethically sourced data presents a considerable challenge to the adoption of artificial intelligence (AI) in sensitive fields like healthcare, finance, and biomedical research. Furthermore, access to unrestricted public datasets is increasingly constrained due to rising concerns over privacy, copyright, and competition. Synthetic data has emerged as a promising alternative, and diffusion models -- a cutting-edge generative AI technology -- provide an effective solution for generating high-quality and diverse synthetic data. In this paper, we introduce a novel federated learning framework for training diffusion models on decentralized private datasets. Our framework leverages personalization and the inherent noise in the forward diffusion process to produce high-quality samples while ensuring robust differential privacy guarantees. Our experiments show that our framework outperforms non-collaborative training methods, particularly in settings with high data heterogeneity, and effectively reduces biases and imbalances in synthetic data, resulting in fairer downstream models.

Problem

Research questions and friction points this paper is trying to address.

Training diffusion models on decentralized private datasets

Ensuring privacy guarantees in synthetic data generation

Reducing biases and imbalances in synthetic data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated learning for decentralized diffusion models

Personalization and noise enhance privacy guarantees

Reduces biases in synthetic data effectively

🔎 Similar Papers

Navigating Heterogeneity and Privacy in One-Shot Federated Learning with Diffusion Models