Adaptive Spectral Feature Forecasting for Diffusion Sampling Acceleration

📅 2026-03-02

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses the inefficiency of diffusion model inference caused by existing feature caching methods that rely on local approximations, which lead to error accumulation and degraded generation quality under large sampling steps. The authors propose Spectrum, a training-free spectral diffusion feature predictor that, for the first time, models denoiser latent features as a function of time and employs Chebyshev polynomials for global spectral approximation. Coupled with ridge regression, this approach enables high-accuracy multi-step future feature prediction. Theoretically, it avoids error accumulation with increasing step size, thereby overcoming the limitations of local approximation. Experiments demonstrate up to 4.79× and 4.67× acceleration on FLUX.1 and Wan2.1-14B, respectively, with generation quality significantly surpassing current baselines.

Technology Category

Application Category

📝 Abstract

Diffusion models have become the dominant tool for high-fidelity image and video generation, yet are critically bottlenecked by their inference speed due to the numerous iterative passes of Diffusion Transformers. To reduce the exhaustive compute, recent works resort to the feature caching and reusing scheme that skips network evaluations at selected diffusion steps by using cached features in previous steps. However, their preliminary design solely relies on local approximation, causing errors to grow rapidly with large skips and leading to degraded sample quality at high speedups. In this work, we propose spectral diffusion feature forecaster (Spectrum), a training-free approach that enables global, long-range feature reuse with tightly controlled error. In particular, we view the latent features of the denoiser as functions over time and approximate them with Chebyshev polynomials. Specifically, we fit the coefficient for each basis via ridge regression, which is then leveraged to forecast features at multiple future diffusion steps. We theoretically reveal that our approach admits more favorable long-horizon behavior and yields an error bound that does not compound with the step size. Extensive experiments on various state-of-the-art image and video diffusion models consistently verify the superiority of our approach. Notably, we achieve up to 4.79$\times$ speedup on FLUX.1 and 4.67$\times$ speedup on Wan2.1-14B, while maintaining much higher sample quality compared with the baselines.

Problem

Research questions and friction points this paper is trying to address.

diffusion models

inference acceleration

feature reuse

sampling speedup

error accumulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion sampling acceleration

spectral forecasting

Chebyshev approximation