π€ AI Summary
How can diffusion model inference efficiency be optimized under resource constraints without fine-tuning? This paper proposes PostDiff, a training-free framework that jointly improves efficiency via input-level mixed-resolution denoising and module-level cache reuseβwhile preserving generation quality. Crucially, the authors find that reducing per-step computational cost (e.g., via resolution scaling and module reuse) yields higher cost-quality trade-offs than decreasing the number of denoising steps. PostDiff requires no fine-tuning or retraining, yet delivers end-to-end acceleration across multiple state-of-the-art diffusion models, achieving substantial speedup with stable FID and LPIPS scores. By eliminating redundant computation at both the input and module levels, PostDiff provides a lightweight, plug-and-play solution for efficient deployment. The implementation is publicly available.
π Abstract
Diffusion models have shown remarkable success across generative tasks, yet their high computational demands challenge deployment on resource-limited platforms. This paper investigates a critical question for compute-optimal diffusion model deployment: Under a post-training setting without fine-tuning, is it more effective to reduce the number of denoising steps or to use a cheaper per-step inference? Intuitively, reducing the number of denoising steps increases the variability of the distributions across steps, making the model more sensitive to compression. In contrast, keeping more denoising steps makes the differences smaller, preserving redundancy, and making post-training compression more feasible. To systematically examine this, we propose PostDiff, a training-free framework for accelerating pre-trained diffusion models by reducing redundancy at both the input level and module level in a post-training manner. At the input level, we propose a mixed-resolution denoising scheme based on the insight that reducing generation resolution in early denoising steps can enhance low-frequency components and improve final generation fidelity. At the module level, we employ a hybrid module caching strategy to reuse computations across denoising steps. Extensive experiments and ablation studies demonstrate that (1) PostDiff can significantly improve the fidelity-efficiency trade-off of state-of-the-art diffusion models, and (2) to boost efficiency while maintaining decent generation fidelity, reducing per-step inference cost is often more effective than reducing the number of denoising steps. Our code is available at https://github.com/GATECH-EIC/PostDiff.