🤖 AI Summary
This work addresses the high computational cost of diffusion models during inference, which stems from repeated evaluations of heavy-weight neural networks. To mitigate this, the authors propose a dual-rate diffusion mechanism that interleaves sparse invocations of a heavyweight context encoder with frequent applications of a lightweight denoising network, while reusing high-dimensional features extracted by the heavyweight component to accelerate sampling. The approach is compatible with existing distillation techniques such as Moment Matching Distillation and substantially reduces computational overhead. On the ImageNet benchmark, the method achieves a 2–4× reduction in inference computation while preserving standard generation quality, enabling efficient few-step synthesis.
📝 Abstract
Diffusion models achieve state-of-the-art generative performance but suffer from high computational costs during inference due to the repeated evaluation of a heavy neural network. In this work, we propose Dual-Rate Diffusion, a method to accelerate sampling by interleaving the execution of a heavy high-capacity context encoder and a light efficient denoising model. The context encoder is evaluated sparsely to extract high-dimensional features, which are effectively reused by the light denoising model at every step to refine the sample efficiently. This approach significantly accelerates inference without compromising sample quality. On ImageNet benchmarks, Dual-Rate Diffusion matches the performance of standard baselines while reducing computational cost by a factor of $2$-$4$. Furthermore, we demonstrate that our method is compatible with distillation techniques, such as Moment Matching Distillation, enabling further efficiency gains in few-step generation.