Fewer Denoising Steps or Cheaper Per-Step Inference: Towards Compute-Optimal Diffusion Model Deployment

📅 2025-08-08

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

How can diffusion model inference efficiency be optimized under resource constraints without fine-tuning? This paper proposes PostDiff, a training-free framework that jointly improves efficiency via input-level mixed-resolution denoising and module-level cache reuse—while preserving generation quality. Crucially, the authors find that reducing per-step computational cost (e.g., via resolution scaling and module reuse) yields higher cost-quality trade-offs than decreasing the number of denoising steps. PostDiff requires no fine-tuning or retraining, yet delivers end-to-end acceleration across multiple state-of-the-art diffusion models, achieving substantial speedup with stable FID and LPIPS scores. By eliminating redundant computation at both the input and module levels, PostDiff provides a lightweight, plug-and-play solution for efficient deployment. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Diffusion models have shown remarkable success across generative tasks, yet their high computational demands challenge deployment on resource-limited platforms. This paper investigates a critical question for compute-optimal diffusion model deployment: Under a post-training setting without fine-tuning, is it more effective to reduce the number of denoising steps or to use a cheaper per-step inference? Intuitively, reducing the number of denoising steps increases the variability of the distributions across steps, making the model more sensitive to compression. In contrast, keeping more denoising steps makes the differences smaller, preserving redundancy, and making post-training compression more feasible. To systematically examine this, we propose PostDiff, a training-free framework for accelerating pre-trained diffusion models by reducing redundancy at both the input level and module level in a post-training manner. At the input level, we propose a mixed-resolution denoising scheme based on the insight that reducing generation resolution in early denoising steps can enhance low-frequency components and improve final generation fidelity. At the module level, we employ a hybrid module caching strategy to reuse computations across denoising steps. Extensive experiments and ablation studies demonstrate that (1) PostDiff can significantly improve the fidelity-efficiency trade-off of state-of-the-art diffusion models, and (2) to boost efficiency while maintaining decent generation fidelity, reducing per-step inference cost is often more effective than reducing the number of denoising steps. Our code is available at https://github.com/GATECH-EIC/PostDiff.

Problem

Research questions and friction points this paper is trying to address.

Optimizing compute efficiency in diffusion model deployment

Balancing denoising steps versus per-step inference cost

Enhancing fidelity-efficiency trade-off without fine-tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixed-resolution denoising for input efficiency

Hybrid module caching for computation reuse

Post-training compression without fine-tuning

🔎 Similar Papers

No similar papers found.