🤖 AI Summary
This work addresses the limitation of classifier-guided diffusion models, which rely on noise-robust classifiers—a requirement often unmet by standard classifiers trained only on clean data and severely degraded under diffusion noise. We propose the first stable guidance method tailored for non-robust classifiers. Our core innovation introduces one-step denoised image prediction as an intermediate supervision signal and employs exponential moving average (EMA) to smooth classifier-guided gradients, effectively mitigating gradient instability under noisy conditions. Crucially, our approach requires no classifier retraining or fine-tuning and integrates seamlessly into standard diffusion sampling pipelines. Experiments on CelebA, SportBalls, and CelebA-HQ demonstrate substantial improvements in both class-conditional guidance stability and accuracy, while preserving sample diversity and visual fidelity.
📝 Abstract
Classifier guidance is intended to steer a diffusion process such that a given classifier reliably recognizes the generated data point as a certain class. However, most classifier guidance approaches are restricted to robust classifiers, which were specifically trained on the noise of the diffusion forward process. We extend classifier guidance to work with general, non-robust, classifiers that were trained without noise. We analyze the sensitivity of both non-robust and robust classifiers to noise of the diffusion process on the standard CelebA data set, the specialized SportBalls data set and the high-dimensional real-world CelebA-HQ data set. Our findings reveal that non-robust classifiers exhibit significant accuracy degradation under noisy conditions, leading to unstable guidance gradients. To mitigate these issues, we propose a method that utilizes one-step denoised image predictions and implements stabilization techniques inspired by stochastic optimization methods, such as exponential moving averages. Experimental results demonstrate that our approach improves the stability of classifier guidance while maintaining sample diversity and visual quality. This work contributes to advancing conditional sampling techniques in generative models, enabling a broader range of classifiers to be used as guidance classifiers.