🤖 AI Summary
This work addresses the high sampling cost of Diffusion Transformers (DiTs) and the failure of existing handcrafted feature caching methods under aggressive step skipping. To this end, the authors propose L2P (Learnable Linear Predictor), a data-driven feature caching framework that introduces, for the first time, a learnable per-timestep linear predictor into the caching mechanism. This lightweight module reconstructs current features from historical feature trajectories, thereby overcoming the limitations of fixed-formula approaches. Requiring only brief single-GPU training, L2P achieves a 4.55× reduction in FLOPs and a 4.15× speedup in inference latency on FLUX.1-dev, while enabling up to a 7.18× acceleration on the Qwen-Image model with minimal degradation in visual fidelity.
📝 Abstract
To address the high sampling cost of Diffusion Transformers (DiTs), feature caching offers a training-free acceleration method. However, existing methods rely on hand-crafted forecasting formulas that fail under aggressive skipping. We propose L2P (Learnable Linear Predictor), a simple data-driven caching framework that replaces fixed coefficients with learnable per-timestep weights. Rapidly trained in ~20 seconds on a single GPU, L2P accurately reconstructs current features from past trajectories. L2P significantly outperforms existing baselines: it achieves a 4.55x FLOPs reduction and 4.15x latency speedup on FLUX.1-dev, and maintains high visual fidelity under up to 7.18x acceleration on Qwen-Image models, where prior methods show noticeable quality degradation. Our results show learning linear predictors is highly effective for efficient DiT inference. Code is available at https://github.com/Aredstone/L2P-Cache.