Let Features Decide Their Own Solvers: Hybrid Feature Caching for Diffusion Transformers

📅 2025-10-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion Transformers (DiTs) suffer from low sampling efficiency due to the requirement of full Transformer forward passes at every denoising step. Existing feature caching strategies apply uniform policies across all dimensions, ignoring their heterogeneous dynamic behaviors. To address this, we propose HyCa, a hybrid feature caching framework that— for the first time—models latent feature evolution as a dimension-wise mixed ordinary differential equation (ODE) system. Leveraging this model, HyCa introduces a training-free, near-lossless dimension-adaptive caching mechanism, where ODE solvers dynamically regulate caching frequency per dimension. Evaluated on state-of-the-art DiT models—including FLUX, HunyuanVideo, and Qwen-Image—HyCa achieves 5.55–6.24× inference speedup while degrading key quality metrics (e.g., FID and CLIP Score) by less than 0.5%. This significantly surpasses the limitations of conventional caching paradigms.

Technology Category

Application Category

📝 Abstract
Diffusion Transformers offer state-of-the-art fidelity in image and video synthesis, but their iterative sampling process remains a major bottleneck due to the high cost of transformer forward passes at each timestep. To mitigate this, feature caching has emerged as a training-free acceleration technique that reuses or forecasts hidden representations. However, existing methods often apply a uniform caching strategy across all feature dimensions, ignoring their heterogeneous dynamic behaviors. Therefore, we adopt a new perspective by modeling hidden feature evolution as a mixture of ODEs across dimensions, and introduce HyCa, a Hybrid ODE solver inspired caching framework that applies dimension-wise caching strategies. HyCa achieves near-lossless acceleration across diverse domains and models, including 5.55 times speedup on FLUX, 5.56 times speedup on HunyuanVideo, 6.24 times speedup on Qwen-Image and Qwen-Image-Edit without retraining.
Problem

Research questions and friction points this paper is trying to address.

Accelerating Diffusion Transformers by optimizing feature caching strategies
Addressing heterogeneous feature dynamics with hybrid ODE solver approach
Enabling training-free speedup while maintaining near-lossless generation quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid ODE solver inspired caching framework
Dimension-wise caching strategies for features
Modeling feature evolution as mixture of ODEs