Forecast then Calibrate: Feature Caching as ODE for Efficient Diffusion Transformers

πŸ“… 2025-08-22
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing feature caching methods struggle to preserve DiT generation quality under high acceleration ratios, primarily due to error accumulation from large-step predictions. This work proposes FoCa, the first framework to model hidden feature evolution as an ordinary differential equation (ODE) trajectory and reformulate feature caching as an ODE solving problem. FoCa introduces a training-free prediction-correction mechanism that stably reuses and refines historical features even under aggressive step-skipping, effectively suppressing error propagation. Evaluated on image and video generation tasks, FoCa achieves lossless acceleration: 5.50Γ— for FLUX, 6.45Γ— for HunyuanVideo, and 4.53Γ— for DiTβ€”reaching a peak speedup of 6.45Γ— without compromising generation fidelity. The method significantly enhances the inference efficiency of Diffusion Transformers while maintaining perceptual and quantitative quality.

Technology Category

Application Category

πŸ“ Abstract
Diffusion Transformers (DiTs) have demonstrated exceptional performance in high-fidelity image and video generation. To reduce their substantial computational costs, feature caching techniques have been proposed to accelerate inference by reusing hidden representations from previous timesteps. However, current methods often struggle to maintain generation quality at high acceleration ratios, where prediction errors increase sharply due to the inherent instability of long-step forecasting. In this work, we adopt an ordinary differential equation (ODE) perspective on the hidden-feature sequence, modeling layer representations along the trajectory as a feature-ODE. We attribute the degradation of existing caching strategies to their inability to robustly integrate historical features under large skipping intervals. To address this, we propose FoCa (Forecast-then-Calibrate), which treats feature caching as a feature-ODE solving problem. Extensive experiments on image synthesis, video generation, and super-resolution tasks demonstrate the effectiveness of FoCa, especially under aggressive acceleration. Without additional training, FoCa achieves near-lossless speedups of 5.50 times on FLUX, 6.45 times on HunyuanVideo, 3.17 times on Inf-DiT, and maintains high quality with a 4.53 times speedup on DiT.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational costs in Diffusion Transformers
Maintaining generation quality under high acceleration
Addressing prediction errors from long-step forecasting
Innovation

Methods, ideas, or system contributions that make the work stand out.

Feature caching as ODE solving
Forecast-then-calibrate integration strategy
Maintains quality under aggressive acceleration
πŸ”Ž Similar Papers
No similar papers found.
S
Shikang Zheng
Shanghai Jiao Tong University
L
Liang Feng
Shanghai Jiao Tong University, Fudan University
X
Xinyu Wang
Shanghai Jiao Tong University
Q
Qinming Zhou
Shanghai Jiao Tong University, Tsinghua University
P
Peiliang Cai
Shanghai Jiao Tong University
Chang Zou
Chang Zou
Intern at EPIC Lab, Shanghai Jiao Tong University
Generative modelsImages and Videos generation
J
Jiacheng Liu
Shanghai Jiao Tong University
Yuqi Lin
Yuqi Lin
Zhejiang University
Computer VisionMultimodal Foundation Model
J
Junjie Chen
Shanghai Jiao Tong University
Yue Ma
Yue Ma
Bytedance
NLPDialogue SystemLLM
Linfeng Zhang
Linfeng Zhang
DP Technology; AI for Science Institute
AI for Sciencemulti-scale modelingmolecular simulationdrug/materials design