Timestep-Aware Block Masking for Efficient Diffusion Model Inference

📅 2026-03-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high inference latency and computational cost of diffusion models arising from their iterative denoising process. The authors propose a timestep-aware dynamic inference acceleration method that learns dedicated masks for each denoising step to dynamically skip redundant network blocks and reuse intermediate features, thereby reducing computation. To mitigate the high memory overhead of global backpropagation, mask optimization is performed independently per timestep. Stability is further enhanced through timestep-aware loss scaling and a knowledge-guided mask refinement strategy. The approach achieves significant inference speedups across diverse architectures—including DDPM, LDM, DiT, and PixArt—while preserving generation quality.

Technology Category

Application Category

📝 Abstract
Diffusion Probabilistic Models (DPMs) have achieved great success in image generation but suffer from high inference latency due to their iterative denoising nature. Motivated by the evolving feature dynamics across the denoising trajectory, we propose a novel framework to optimize the computational graph of pre-trained DPMs on a per-timestep basis. By learning timestep-specific masks, our method dynamically determines which blocks to execute or bypass through feature reuse at each inference stage. Unlike global optimization methods that incur prohibitive memory costs via full-chain backpropagation, our method optimizes masks for each timestep independently, ensuring a memory-efficient training process. To guide this process, we introduce a timestep-aware loss scaling mechanism that prioritizes feature fidelity during sensitive denoising phases, complemented by a knowledge-guided mask rectification strategy to prune redundant spatial-temporal dependencies. Our approach is architecture-agnostic and demonstrates significant efficiency gains across a broad spectrum of models, including DDPM, LDM, DiT, and PixArt. Experimental results show that by treating the denoising process as a sequence of optimized computational paths, our method achieves a superior balance between sampling speed and generative quality. Our code will be released.
Problem

Research questions and friction points this paper is trying to address.

Diffusion Probabilistic Models
inference latency
iterative denoising
computational efficiency
image generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

timestep-aware masking
efficient diffusion inference
dynamic block skipping
feature reuse
memory-efficient training
🔎 Similar Papers
No similar papers found.