Accelerating Diffusion Transformer via Error-Optimized Cache

📅 2025-01-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the slow sampling speed of DiT-based diffusion models and the quality degradation caused by cache acceleration, this paper proposes an Error-Optimized Caching (EOC) mechanism. EOC introduces the first error-modeling–driven caching optimization framework, comprising three core components: prior knowledge extraction, cache-step optimizability criteria, and active error suppression. By integrating error-aware caching policies, temporal feature difference modeling, and dynamic cache decision-making, EOC significantly mitigates quality degradation from over-caching—without appreciable computational overhead. On ImageNet, EOC improves FID scores from 6.857 to 5.821 (×2 acceleration), 3.870 to 3.692 (×4), and 3.539 to 3.451 (×6), demonstrating a compelling trade-off between generation quality and inference efficiency.

Technology Category

Application Category

📝 Abstract
Diffusion Transformer (DiT) is a crucial method for content generation. However, it needs a lot of time to sample. Many studies have attempted to use caching to reduce the time consumption of sampling. Existing caching methods accelerate generation by reusing DiT features from the previous time step and skipping calculations in the next, but they tend to locate and cache low-error modules without focusing on reducing caching-induced errors, resulting in a sharp decline in generated content quality when increasing caching intensity. To solve this problem, we propose the Error-Optimized Cache (EOC). This method introduces three key improvements: (1) Prior knowledge extraction: Extract and process the caching differences; (2) A judgment method for cache optimization: Determine whether certain caching steps need to be optimized; (3) Cache optimization: reduce caching errors. Experiments show that this algorithm significantly reduces the error accumulation caused by caching (especially over-caching). On the ImageNet dataset, without significantly increasing the computational burden, this method improves the quality of the generated images under the over-caching, rule-based, and training-based methods. Specifically, the Fr'echet Inception Distance (FID) values are improved as follows: from 6.857 to 5.821, from 3.870 to 3.692 and form 3.539 to 3.451 respectively.
Problem

Research questions and friction points this paper is trying to address.

Diffusion Transformers
Time Efficiency
Quality Degradation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Error Optimization Cache
Diffusion Transformer
Storage Efficiency
🔎 Similar Papers
No similar papers found.
J
Junxiang Qiu
University of Science and Technology of China, Hefei, China
S
Shuo Wang
University of Science and Technology of China, Hefei, China
Jinda Lu
Jinda Lu
University of Science and Technology of China
L
Lin Liu
University of Science and Technology of China, Hefei, China
Houcheng Jiang
Houcheng Jiang
University of Science and Technology of China
Model editingLLMs
Yanbin Hao
Yanbin Hao
Hefei University of Technology
Video retrievalvideo action recognitionhashingVideo Hyperlinking