Accelerating Diffusion Transformer via Error-Optimized Cache

📅 2025-01-31

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the slow sampling speed of DiT-based diffusion models and the quality degradation caused by cache acceleration, this paper proposes an Error-Optimized Caching (EOC) mechanism. EOC introduces the first error-modeling–driven caching optimization framework, comprising three core components: prior knowledge extraction, cache-step optimizability criteria, and active error suppression. By integrating error-aware caching policies, temporal feature difference modeling, and dynamic cache decision-making, EOC significantly mitigates quality degradation from over-caching—without appreciable computational overhead. On ImageNet, EOC improves FID scores from 6.857 to 5.821 (×2 acceleration), 3.870 to 3.692 (×4), and 3.539 to 3.451 (×6), demonstrating a compelling trade-off between generation quality and inference efficiency.

Technology Category

Application Category

📝 Abstract

Diffusion Transformer (DiT) is a crucial method for content generation. However, it needs a lot of time to sample. Many studies have attempted to use caching to reduce the time consumption of sampling. Existing caching methods accelerate generation by reusing DiT features from the previous time step and skipping calculations in the next, but they tend to locate and cache low-error modules without focusing on reducing caching-induced errors, resulting in a sharp decline in generated content quality when increasing caching intensity. To solve this problem, we propose the Error-Optimized Cache (EOC). This method introduces three key improvements: (1) Prior knowledge extraction: Extract and process the caching differences; (2) A judgment method for cache optimization: Determine whether certain caching steps need to be optimized; (3) Cache optimization: reduce caching errors. Experiments show that this algorithm significantly reduces the error accumulation caused by caching (especially over-caching). On the ImageNet dataset, without significantly increasing the computational burden, this method improves the quality of the generated images under the over-caching, rule-based, and training-based methods. Specifically, the Fr'echet Inception Distance (FID) values are improved as follows: from 6.857 to 5.821, from 3.870 to 3.692 and form 3.539 to 3.451 respectively.

Problem

Research questions and friction points this paper is trying to address.

Diffusion Transformers

Time Efficiency

Quality Degradation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Error Optimization Cache

Diffusion Transformer

Storage Efficiency

🔎 Similar Papers

No similar papers found.

Authors to Follow