FreqCa: Accelerating Diffusion Models via Frequency-Aware Caching

📅 2025-10-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion Transformers incur high inference costs, and existing feature caching methods rely on the assumption of temporal continuity between adjacent timesteps—limiting their generalizability. This work pioneers a frequency-domain perspective to model feature dynamics during diffusion, revealing that low-frequency components evolve smoothly while high-frequency components exhibit rapid, discontinuous changes. Based on this insight, we propose a frequency-aware caching mechanism: low-frequency features are reused across non-consecutive timesteps, whereas high-frequency features are reconstructed via second-order Hermite interpolation. Furthermore, we introduce Cumulative Residual Features (CRF), a novel caching strategy that reduces memory footprint by 99%. Extensive evaluation on state-of-the-art models—including FLUX.1-dev and Qwen-Image—demonstrates significant acceleration in both image generation and editing tasks, with no perceptible degradation in output quality.

Technology Category

Application Category

📝 Abstract
The application of diffusion transformers is suffering from their significant inference costs. Recently, feature caching has been proposed to solve this problem by reusing features from previous timesteps, thereby skipping computation in future timesteps. However, previous feature caching assumes that features in adjacent timesteps are similar or continuous, which does not always hold in all settings. To investigate this, this paper begins with an analysis from the frequency domain, which reveal that different frequency bands in the features of diffusion models exhibit different dynamics across timesteps. Concretely, low-frequency components, which decide the structure of images, exhibit higher similarity but poor continuity. In contrast, the high-frequency bands, which decode the details of images, show significant continuity but poor similarity. These interesting observations motivate us to propose Frequency-aware Caching (FreqCa) which directly reuses features of low-frequency components based on their similarity, while using a second-order Hermite interpolator to predict the volatile high-frequency ones based on its continuity. Besides, we further propose to cache Cumulative Residual Feature (CRF) instead of the features in all the layers, which reduces the memory footprint of feature caching by 99%. Extensive experiments on FLUX.1-dev, FLUX.1-Kontext-dev, Qwen-Image, and Qwen-Image-Edit demonstrate its effectiveness in both generation and editing. Codes are available in the supplementary materials and will be released on GitHub.
Problem

Research questions and friction points this paper is trying to address.

Reducing diffusion models' high inference costs through frequency analysis
Addressing limitations of feature caching in adjacent timesteps
Optimizing feature reuse and prediction across different frequency bands
Innovation

Methods, ideas, or system contributions that make the work stand out.

Frequency-aware caching reuses low-frequency features directly
Hermite interpolator predicts volatile high-frequency components
Cumulative Residual Feature caching reduces memory footprint by 99%
🔎 Similar Papers
No similar papers found.
J
Jiacheng Liu
EPIC Lab, STJU
P
Peiliang Cai
EPIC Lab, STJU
Q
Qinming Zhou
EPIC Lab, STJU
Yuqi Lin
Yuqi Lin
Zhejiang University
Computer VisionMultimodal Foundation Model
Deyang Kong
Deyang Kong
Peking University
Natural Language Processing
B
Benhao Huang
EPIC Lab, STJU
Y
Yupei Pan
EPIC Lab, STJU
H
Haowen Xu
EPIC Lab, STJU
Chang Zou
Chang Zou
Intern at EPIC Lab, Shanghai Jiao Tong University
Generative modelsImages and Videos generation
J
Junshu Tang
Tencent Hunyuan
S
Shikang Zheng
EPIC Lab, STJU
Linfeng Zhang
Linfeng Zhang
DP Technology; AI for Science Institute
AI for Sciencemulti-scale modelingmolecular simulationdrug/materials design