🤖 AI Summary
This work identifies a systematic “non-denoising” behavior in practical sampling of conditional diffusion models: under text or observational conditioning, their denoising trajectories consistently deviate from the theoretically ideal path, causing inconsistent generation across algorithms such as DDPM and DDIM. To quantify this phenomenon, the authors introduce *Schedule Deviation*, a novel metric that—through empirical analysis and theoretical justification—demonstrates for the first time that this deviation stems from an inherent inductive bias, independent of model capacity or dataset scale. Further, via theoretical analysis and manifold consistency verification, they establish that smoothness priors fundamentally impede alignment across conditional denoising flows. This work provides an interpretable diagnostic tool for conditional diffusion models and advances the development of robust conditional sampling strategies and training paradigms.
📝 Abstract
We study the inductive biases of diffusion models with a conditioning-variable, which have seen widespread application as both text-conditioned generative image models and observation-conditioned continuous control policies. We observe that when these models are queried conditionally, their generations consistently deviate from the idealized "denoising" process upon which diffusion models are formulated, inducing disagreement between popular sampling algorithms (e.g. DDPM, DDIM). We introduce Schedule Deviation, a rigorous measure which captures the rate of deviation from a standard denoising process, and provide a methodology to compute it. Crucially, we demonstrate that the deviation from an idealized denoising process occurs irrespective of the model capacity or amount of training data. We posit that this phenomenon occurs due to the difficulty of bridging distinct denoising flows across different parts of the conditioning space and show theoretically how such a phenomenon can arise through an inductive bias towards smoothness.