OUSAC: Optimized Guidance Scheduling with Adaptive Caching for DiT Acceleration

📅 2025-12-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion Transformers (DiTs) suffer from low inference efficiency, and classifier-free guidance (CFG), while improving sample quality and controllability, doubles computational cost. This paper proposes an efficient CFG inference framework tailored for DiTs. First, it introduces a novel sparse CFG scheduling paradigm under *variable* guidance scales—departing from the conventional fixed-scale assumption. Second, it jointly optimizes skip-step strategies and dynamic guidance scales via evolutionary algorithms. Third, it designs a hierarchical Transformer block-wise adaptive low-rank KV cache mechanism, enabling per-block rank calibration. Evaluated on DiT-XL/2, PixArt-α, and FLUX, the method achieves 53%, 60%, and 5× FLOPs reduction, respectively, while improving CLIP Score by 15%, 16.1%, and surpassing the 50-step baseline—demonstrating unprecedented balance between inference efficiency and generation fidelity.

Technology Category

Application Category

📝 Abstract
Diffusion models have emerged as the dominant paradigm for high-quality image generation, yet their computational expense remains substantial due to iterative denoising. Classifier-Free Guidance (CFG) significantly enhances generation quality and controllability but doubles the computation by requiring both conditional and unconditional forward passes at every timestep. We present OUSAC (Optimized gUidance Scheduling with Adaptive Caching), a framework that accelerates diffusion transformers (DiT) through systematic optimization. Our key insight is that variable guidance scales enable sparse computation: adjusting scales at certain timesteps can compensate for skipping CFG at others, enabling both fewer total sampling steps and fewer CFG steps while maintaining quality. However, variable guidance patterns introduce denoising deviations that undermine standard caching methods, which assume constant CFG scales across steps. Moreover, different transformer blocks are affected at different levels under dynamic conditions. This paper develops a two-stage approach leveraging these insights. Stage-1 employs evolutionary algorithms to jointly optimize which timesteps to skip and what guidance scale to use, eliminating up to 82% of unconditional passes. Stage-2 introduces adaptive rank allocation that tailors calibration efforts per transformer block, maintaining caching effectiveness under variable guidance. Experiments demonstrate that OUSAC significantly outperforms state-of-the-art acceleration methods, achieving 53% computational savings with 15% quality improvement on DiT-XL/2 (ImageNet 512x512), 60% savings with 16.1% improvement on PixArt-alpha (MSCOCO), and 5x speedup on FLUX while improving CLIP Score over the 50-step baseline.
Problem

Research questions and friction points this paper is trying to address.

Accelerates diffusion transformers by reducing CFG computational overhead
Optimizes guidance scheduling to skip steps while maintaining image quality
Introduces adaptive caching to handle variable guidance scale deviations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evolutionary algorithm optimizes guidance schedule and skips steps
Adaptive rank allocation maintains cache effectiveness per block
Variable guidance scales reduce unconditional passes by 82%
🔎 Similar Papers
No similar papers found.
R
Ruitong Sun
School of Computing, University of Georgia
Tianze Yang
Tianze Yang
University of Georgia
Machine LearningComputer Vision
W
Wei Niu
School of Computing, University of Georgia
Jin Sun
Jin Sun
Assistant Professor, University of Georgia
Computer Vision