OUSAC: Optimized Guidance Scheduling with Adaptive Caching for DiT Acceleration

📅 2025-12-16

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Diffusion Transformers (DiTs) suffer from low inference efficiency, and classifier-free guidance (CFG), while improving sample quality and controllability, doubles computational cost. This paper proposes an efficient CFG inference framework tailored for DiTs. First, it introduces a novel sparse CFG scheduling paradigm under *variable* guidance scales—departing from the conventional fixed-scale assumption. Second, it jointly optimizes skip-step strategies and dynamic guidance scales via evolutionary algorithms. Third, it designs a hierarchical Transformer block-wise adaptive low-rank KV cache mechanism, enabling per-block rank calibration. Evaluated on DiT-XL/2, PixArt-α, and FLUX, the method achieves 53%, 60%, and 5× FLOPs reduction, respectively, while improving CLIP Score by 15%, 16.1%, and surpassing the 50-step baseline—demonstrating unprecedented balance between inference efficiency and generation fidelity.

Technology Category

Application Category

📝 Abstract

Diffusion models have emerged as the dominant paradigm for high-quality image generation, yet their computational expense remains substantial due to iterative denoising. Classifier-Free Guidance (CFG) significantly enhances generation quality and controllability but doubles the computation by requiring both conditional and unconditional forward passes at every timestep. We present OUSAC (Optimized gUidance Scheduling with Adaptive Caching), a framework that accelerates diffusion transformers (DiT) through systematic optimization. Our key insight is that variable guidance scales enable sparse computation: adjusting scales at certain timesteps can compensate for skipping CFG at others, enabling both fewer total sampling steps and fewer CFG steps while maintaining quality. However, variable guidance patterns introduce denoising deviations that undermine standard caching methods, which assume constant CFG scales across steps. Moreover, different transformer blocks are affected at different levels under dynamic conditions. This paper develops a two-stage approach leveraging these insights. Stage-1 employs evolutionary algorithms to jointly optimize which timesteps to skip and what guidance scale to use, eliminating up to 82% of unconditional passes. Stage-2 introduces adaptive rank allocation that tailors calibration efforts per transformer block, maintaining caching effectiveness under variable guidance. Experiments demonstrate that OUSAC significantly outperforms state-of-the-art acceleration methods, achieving 53% computational savings with 15% quality improvement on DiT-XL/2 (ImageNet 512x512), 60% savings with 16.1% improvement on PixArt-alpha (MSCOCO), and 5x speedup on FLUX while improving CLIP Score over the 50-step baseline.

Problem

Research questions and friction points this paper is trying to address.

Accelerates diffusion transformers by reducing CFG computational overhead

Optimizes guidance scheduling to skip steps while maintaining image quality

Introduces adaptive caching to handle variable guidance scale deviations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evolutionary algorithm optimizes guidance schedule and skips steps

Adaptive rank allocation maintains cache effectiveness per block

Variable guidance scales reduce unconditional passes by 82%

🔎 Similar Papers

Real-time Motion Planning for autonomous vehicles in dynamic environments