ERTACache: Error Rectification and Timesteps Adjustment for Efficient Diffusion

πŸ“… 2025-08-27
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Diffusion models suffer from high computational overhead due to iterative sampling, while existing feature caching methods incur accumulating errors from naive reuse: (i) feature shift error arising from inaccurate cached outputs, and (ii) step-amplification error caused by error propagation under fixed-step scheduling. This paper proposes the first error-aware feature caching framework tailored for diffusion models, which uniquely decouples caching error into two analyzable components. We introduce a trajectory-aware dynamic correction mechanism jointly optimized with a closed-form residual linearization model to jointly mitigate both errors. Leveraging offline residual analysis, adaptive integration interval adjustment, and closed-loop modeling, our method significantly improves caching fidelity. Evaluated on image and video generation tasks, it achieves up to 2Γ— speedup without compromising visual qualityβ€”on Wan2.1, VBench scores remain nearly lossless. The framework thus bridges efficiency and perceptual fidelity in diffusion-based generation.

Technology Category

Application Category

πŸ“ Abstract
Diffusion models suffer from substantial computational overhead due to their inherently iterative inference process. While feature caching offers a promising acceleration strategy by reusing intermediate outputs across timesteps, naive reuse often incurs noticeable quality degradation. In this work, we formally analyze the cumulative error introduced by caching and decompose it into two principal components: feature shift error, caused by inaccuracies in cached outputs, and step amplification error, which arises from error propagation under fixed timestep schedules. To address these issues, we propose ERTACache, a principled caching framework that jointly rectifies both error types. Our method employs an offline residual profiling stage to identify reusable steps, dynamically adjusts integration intervals via a trajectory-aware correction coefficient, and analytically approximates cache-induced errors through a closed-form residual linearization model. Together, these components enable accurate and efficient sampling under aggressive cache reuse. Extensive experiments across standard image and video generation benchmarks show that ERTACache achieves up to 2x inference speedup while consistently preserving or even improving visual quality. Notably, on the state-of-the-art Wan2.1 video diffusion model, ERTACache delivers 2x acceleration with minimal VBench degradation, effectively maintaining baseline fidelity while significantly improving efficiency. The code is available at https://github.com/bytedance/ERTACache.
Problem

Research questions and friction points this paper is trying to address.

Reduces computational overhead in diffusion model inference
Addresses quality degradation from naive feature caching
Rectifies cumulative error from feature shift and step amplification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Offline residual profiling identifies reusable steps
Dynamic adjustment of integration intervals via correction
Closed-form residual linearization approximates cache errors
X
Xurui Peng
ByteDance Inc.
H
Hong Liu
ByteDance Inc.
Chenqian Yan
Chenqian Yan
Xiamen University
Model Compression
R
Rui Ma
ByteDance Inc.
F
Fangmin Chen
ByteDance Inc.
X
Xing Wang
ByteDance Inc.
Z
Zhihua Wu
ByteDance Inc.
S
Songwei Liu
ByteDance Inc.
Mingbao Lin
Mingbao Lin
Principal Research Scientist, Rakuten
Model Compression(Multimodal) LLMsDiffusion Models