🤖 AI Summary
Federated learning (FL) struggles to enable lightweight, on-device personalization for diffusion models due to stringent resource constraints and catastrophic forgetting.
Method: This paper proposes a conditional generative framework comprising a shared backbone network and client-specific lightweight identity embeddings. Theoretically, we establish the first equivalence between conditional diffusion training and maximum likelihood estimation of a Gaussian mixture model, proving that client embeddings effectively steer the shared score network toward personalized data distributions. Practically, only ≤0.01% parameters (hundreds) are updated per new client.
Contribution/Results: Our method significantly reduces Kernel Inception Distance (KID)—outperforming baselines substantially—while preserving collaborative pre-training performance. It exhibits robustness to learning rate and communication round variations and mitigates catastrophic forgetting. Overall, it delivers a scalable, interpretable paradigm for resource-constrained federated generative modeling.
📝 Abstract
Recent advances in diffusion models have revolutionized generative AI, but their sheer size makes on device personalization, and thus effective federated learning (FL), infeasible. We propose Shared Backbone Personal Identity Representation Embeddings (SPIRE), a framework that casts per client diffusion based generation as conditional generation in FL. SPIRE factorizes the network into (i) a high capacity global backbone that learns a population level score function and (ii) lightweight, learnable client embeddings that encode local data statistics. This separation enables parameter efficient finetuning that touches $leq 0.01%$ of weights. We provide the first theoretical bridge between conditional diffusion training and maximum likelihood estimation in Gaussian mixture models. For a two component mixture we prove that gradient descent on the DDPM with respect to mixing weights loss recovers the optimal mixing weights and enjoys dimension free error bounds. Our analysis also hints at how client embeddings act as biases that steer a shared score network toward personalized distributions. Empirically, SPIRE matches or surpasses strong baselines during collaborative pretraining, and vastly outperforms them when adapting to unseen clients, reducing Kernel Inception Distance while updating only hundreds of parameters. SPIRE further mitigates catastrophic forgetting and remains robust across finetuning learning rate and epoch choices.