🤖 AI Summary
This work addresses the limitations of conventional diffusion models, which rely on handcrafted noise schedules that struggle to balance efficiency and generation quality across varying resolutions and often involve redundant steps. The authors propose a spectrum-guided, instance-adaptive noise scheduling method that derives a “tight” noise schedule grounded in the spectral characteristics of images and theoretical bounds of the diffusion process. During inference, this schedule is dynamically applied through conditional sampling. By incorporating spectral information into the selection of noise levels—a first in the field—the approach significantly enhances generation quality at low inference step counts within single-stage, pixel-level diffusion models while reducing unnecessary computation.
📝 Abstract
Denoising diffusion models are widely used for high-quality image and video generation. Their performance depends on noise schedules, which define the distribution of noise levels applied during training and the sequence of noise levels traversed during sampling. Noise schedules are typically handcrafted and require manual tuning across different resolutions. In this work, we propose a principled way to design per-instance noise schedules for pixel diffusion, based on the image's spectral properties. By deriving theoretical bounds on the efficacy of minimum and maximum noise levels, we design ``tight'' noise schedules that eliminate redundant steps. During inference, we propose to conditionally sample such noise schedules. Experiments show that our noise schedules improve generative quality of single-stage pixel diffusion models, particularly in the low-step regime.