RAPID: Retrieval Augmented Training of Differentially Private Diffusion Models

📅 2025-02-18

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Existing differentially private diffusion models (DPDMs) suffer from three key bottlenecks: low utility, high GPU memory consumption, and slow inference. To address these, we propose Retrieval-Augmented Private Training for DPDMs (RAG-DPDM), the first framework to integrate retrieval-augmented generation (RAG) into DPDM training. RAG-DPDM constructs a trajectory knowledge base from publicly available data and retrieves early, noise-free sampling trajectories via similarity matching to serve as proxies; differential privacy optimization is then applied only to later denoising steps, effectively decoupling trajectory generation from privacy enforcement. Under identical privacy budgets (ε, δ), RAG-DPDM achieves substantial improvements over state-of-the-art methods: 42% reduction in FID (enhanced sample quality), 63% lower GPU memory footprint, and 58% faster inference latency—demonstrating synergistic optimization across privacy preservation, utility, and computational efficiency.

Technology Category

Application Category

📝 Abstract

Differentially private diffusion models (DPDMs) harness the remarkable generative capabilities of diffusion models while enforcing differential privacy (DP) for sensitive data. However, existing DPDM training approaches often suffer from significant utility loss, large memory footprint, and expensive inference cost, impeding their practical uses. To overcome such limitations, we present RAPID: Retrieval Augmented PrIvate Diffusion model, a novel approach that integrates retrieval augmented generation (RAG) into DPDM training. Specifically, RAPID leverages available public data to build a knowledge base of sample trajectories; when training the diffusion model on private data, RAPID computes the early sampling steps as queries, retrieves similar trajectories from the knowledge base as surrogates, and focuses on training the later sampling steps in a differentially private manner. Extensive evaluation using benchmark datasets and models demonstrates that, with the same privacy guarantee, RAPID significantly outperforms state-of-the-art approaches by large margins in generative quality, memory footprint, and inference cost, suggesting that retrieval-augmented DP training represents a promising direction for developing future privacy-preserving generative models. The code is available at: https://github.com/TanqiuJiang/RAPID

Problem

Research questions and friction points this paper is trying to address.

Reduces utility loss in DP diffusion models

Decreases memory footprint during training

Lowers inference cost for DP models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates retrieval augmented generation

Leverages public data knowledge base

Focuses on later sampling steps

🔎 Similar Papers

No similar papers found.