🤖 AI Summary
Existing diffusion models suffer from a paradigmatic split: “hot” diffusion (pure noise injection) neglects image-frequency correlations, causing stochastic artifacts in early generation steps; “cold” diffusion (pure blurring) lacks stochasticity modeling, leading to out-of-manifold samples. This paper introduces Warm Diffusion, the first framework unifying noise and blur degradation processes via a Blur-Noise hybrid mechanism. Our core contributions are: (1) a spectral-analysis-based divide-and-conquer strategy that decouples denoising from deblurring; (2) the Blur-to-Noise Ratio (BNR), a novel metric enabling adaptive, dynamic control of generation behavior; and (3) a joint noise-blur scheduling scheme coupled with a score-matching estimation framework. Evaluated on multiple image generation benchmarks, Warm Diffusion significantly improves high-frequency detail fidelity and structural consistency while effectively mitigating out-of-manifold generation—outperforming both pure-noise and pure-blur baselines.
📝 Abstract
Diffusion probabilistic models have achieved remarkable success in generative tasks across diverse data types. While recent studies have explored alternative degradation processes beyond Gaussian noise, this paper bridges two key diffusion paradigms: hot diffusion, which relies entirely on noise, and cold diffusion, which uses only blurring without noise. We argue that hot diffusion fails to exploit the strong correlation between high-frequency image detail and low-frequency structures, leading to random behaviors in the early steps of generation. Conversely, while cold diffusion leverages image correlations for prediction, it neglects the role of noise (randomness) in shaping the data manifold, resulting in out-of-manifold issues and partially explaining its performance drop. To integrate both strengths, we propose Warm Diffusion, a unified Blur-Noise Mixture Diffusion Model (BNMD), to control blurring and noise jointly. Our divide-and-conquer strategy exploits the spectral dependency in images, simplifying score model estimation by disentangling the denoising and deblurring processes. We further analyze the Blur-to-Noise Ratio (BNR) using spectral analysis to investigate the trade-off between model learning dynamics and changes in the data manifold. Extensive experiments across benchmarks validate the effectiveness of our approach for image generation.