DP-aware AdaLN-Zero: Taming Conditioning-Induced Heavy-Tailed Gradients in Differentially Private Diffusion

📅 2026-02-25

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

This work addresses the challenge that conditional diffusion models under differential privacy (DP) training suffer from heavy-tailed gradients induced by heterogeneous conditioning contexts, leading to severe clipping bias and degraded utility. The authors propose a plug-and-play sensitivity-aware conditioning mechanism that jointly constrains the magnitude of condition representations and AdaLN modulation parameters via bounded reparameterization, thereby suppressing extreme gradient tail events prior to clipping. This approach is the first to explicitly incorporate the sensitivity of the conditioning mechanism into its design, effectively mitigating the heavy-tailedness introduced by conditioning without altering the standard DP-SGD algorithm, while preserving representational capacity in non-private settings. Experiments on Power and ETT benchmarks demonstrate significant improvements in imputation and forecasting performance over conventional DP-SGD, with gradient diagnostics confirming reduced clipping distortion and effective reshaping of conditional gradient tails.

Technology Category

Application Category

📝 Abstract

Condition injection enables diffusion models to generate context-aware outputs, which is essential for many time-series tasks. However, heterogeneous conditional contexts (e.g., observed history, missingness patterns or outlier covariates) can induce heavy-tailed per-example gradients. Under Differentially Private Stochastic Gradient Descent (DP-SGD), these rare conditioning-driven heavy-tailed gradients disproportionately trigger global clipping, resulting in outlier-dominated updates, larger clipping bias, and degraded utility under a fixed privacy budget. In this paper, we propose DP-aware AdaLN-Zero, a drop-in sensitivity-aware conditioning mechanism for conditional diffusion transformers that limits conditioning-induced gain without modifying the DP-SGD mechanism. DP-aware AdaLN-Zero jointly constrains conditioning representation magnitude and AdaLN modulation parameters via bounded re-parameterization, suppressing extreme gradient tail events before gradient clipping and noise injection. Empirically, DP-SGD equipped with DP-aware AdaLN-Zero improves interpolation/imputation and forecasting under matched privacy settings. We observe consistent gains on a real-world power dataset and two public ETT benchmarks over vanilla DP-SGD. Moreover, gradient diagnostics attribute these improvements to conditioning-specific tail reshaping and reduced clipping distortion, while preserving expressiveness in non-private training. Overall, these results show that sensitivity-aware conditioning can substantially improve private conditional diffusion training without sacrificing standard performance.

Problem

Research questions and friction points this paper is trying to address.

differential privacy

diffusion models

heavy-tailed gradients

conditioning

gradient clipping

Innovation

Methods, ideas, or system contributions that make the work stand out.

DP-aware AdaLN-Zero

heavy-tailed gradients

differentially private diffusion