🤖 AI Summary
This paper identifies and formalizes the pervasive “noise shift” problem in diffusion model sampling: a systematic mismatch between the prescribed noise schedule and the actual noise levels of intermediate latent states, leading to inaccurate denoising updates and degraded out-of-distribution generalization. To address this, we propose Noise-Aware Guidance—a classifier-free method that explicitly corrects sampling trajectories via a noise-conditioned dropout training mechanism, without requiring external classifiers. Our approach integrates empirical noise estimation with dynamic adjustment, jointly optimizing generation within the reverse SDE/ODE framework. Evaluated on ImageNet synthesis and diverse fine-tuning tasks, the method achieves significant improvements in FID, LPIPS, and other metrics, effectively mitigating noise shift while enhancing sampling stability and generalization capability.
📝 Abstract
Existing denoising generative models rely on solving discretized reverse-time SDEs or ODEs. In this paper, we identify a long-overlooked yet pervasive issue in this family of models: a misalignment between the pre-defined noise level and the actual noise level encoded in intermediate states during sampling. We refer to this misalignment as noise shift. Through empirical analysis, we demonstrate that noise shift is widespread in modern diffusion models and exhibits a systematic bias, leading to sub-optimal generation due to both out-of-distribution generalization and inaccurate denoising updates. To address this problem, we propose Noise Awareness Guidance (NAG), a simple yet effective correction method that explicitly steers sampling trajectories to remain consistent with the pre-defined noise schedule. We further introduce a classifier-free variant of NAG, which jointly trains a noise-conditional and a noise-unconditional model via noise-condition dropout, thereby eliminating the need for external classifiers. Extensive experiments, including ImageNet generation and various supervised fine-tuning tasks, show that NAG consistently mitigates noise shift and substantially improves the generation quality of mainstream diffusion models.