🤖 AI Summary
This work addresses the challenge in differentially private image generation where global noise injection severely degrades high-frequency details, making it difficult to balance privacy guarantees with generation quality. The authors propose a two-stage differentially private generation framework operating in the wavelet domain: first, they apply DP fine-tuning on low-frequency wavelet coefficients to preserve privacy-sensitive structural information, and then leverage publicly available pre-trained super-resolution models to recover high-frequency details. By integrating the privacy sensitivity of images with the frequency characteristics of wavelet decomposition, this approach decouples privacy protection from detail synthesis through a coarse-to-fine strategy and exploits the post-processing property of differential privacy to optimize the utility-privacy trade-off. Experiments on MS-COCO and MM-CelebA-HQ demonstrate that the generated images significantly outperform existing DP generation methods in both visual quality and style fidelity.
📝 Abstract
Generative models trained on sensitive image datasets risk memorizing and reproducing individual training examples, making strong privacy guarantees essential. While differential privacy (DP) provides a principled framework for such guarantees, standard DP finetuning (e.g., with DP-SGD) often results in severe degradation of image quality, particularly in high-frequency textures, due to the indiscriminate addition of noise across all model parameters. In this work, we propose a spectral DP framework based on the hypothesis that the most privacy-sensitive portions of an image are often low-frequency components in the wavelet space (e.g., facial features and object shapes) while high-frequency components are largely generic and public. Based on this hypothesis, we propose the following two-stage framework for DP image generation with coarse image intermediaries: (1) DP finetune an autoregressive spectral image tokenizer model on the low-resolution wavelet coefficients of the sensitive images, and (2) perform high-resolution upsampling using a publicly pretrained super-resolution model. By restricting the privacy budget to the global structures of the image in the first stage, and leveraging the post-processing property of DP for detail refinement, we achieve promising trade-offs between privacy and utility. Experiments on the MS-COCO and MM-CelebA-HQ datasets show that our method generates images with improved quality and style capture relative to other leading DP image frameworks.