🤖 AI Summary
This work addresses the inefficiency of existing diffusion models in handling high-noise states at full resolution, where such states are informationally equivalent to low-resolution images. To overcome this limitation, the authors propose a scale-space diffusion framework that integrates scale-space theory into the diffusion process, leveraging generalized linear degradations—such as downsampling—to construct an efficient generative mechanism. They further introduce Flexi-UNet, a novel network architecture that adaptively activates only necessary modules to perform both resolution-preserving and super-resolution denoising. The method demonstrates strong empirical performance on CelebA and ImageNet, and extensive experiments systematically validate its favorable scalability across varying resolutions and network depths.
📝 Abstract
Diffusion models degrade images through noise, and reversing this process reveals an information hierarchy across timesteps. Scale-space theory exhibits a similar hierarchy via low-pass filtering. We formalize this connection and show that highly noisy diffusion states contain no more information than small, downsampled images - raising the question of why they must be processed at full resolution. To address this, we fuse scale spaces into the diffusion process by formulating a family of diffusion models with generalized linear degradations and practical implementations. Using downsampling as the degradation yields our proposed Scale Space Diffusion. To support Scale Space Diffusion, we introduce Flexi-UNet, a UNet variant that performs resolution-preserving and resolution-increasing denoising using only the necessary parts of the network. We evaluate our framework on CelebA and ImageNet and analyze its scaling behavior across resolutions and network depths. Our project website ( https://prateksha.github.io/projects/scale-space-diffusion/ ) is available publicly.