Plug-and-Play Fidelity Optimization for Diffusion Transformer Acceleration via Cumulative Error Minimization

📅 2025-12-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion Transformers (DiTs) suffer from slow inference due to iterative denoising; existing training-free caching acceleration methods, while efficient, incur substantial accumulated errors, and their fixed caching policies fail to adapt to the dynamic evolution of error during denoising. This paper proposes CEM—a plug-and-play fidelity-optimization plugin—that introduces the first temporally and cache-interval jointly sensitive cumulative error model. Leveraging differentiable prior-guided dynamic programming, CEM derives adaptive caching strategies that minimize reconstruction error. CEM requires no model modification, incurs zero training overhead, and supports quantization compatibility and budget-aware adaptation. Evaluated across nine generative models—including FLUX.1-dev, PixArt-α, Stable Diffusion 1.5, and Hunyuan—and multiple quantization schemes, CEM significantly improves reconstruction fidelity and, in several configurations, even surpasses the performance of the original unaccelerated models.

Technology Category

Application Category

📝 Abstract
Although Diffusion Transformer (DiT) has emerged as a predominant architecture for image and video generation, its iterative denoising process results in slow inference, which hinders broader applicability and development. Caching-based methods achieve training-free acceleration, while suffering from considerable computational error. Existing methods typically incorporate error correction strategies such as pruning or prediction to mitigate it. However, their fixed caching strategy fails to adapt to the complex error variations during denoising, which limits the full potential of error correction. To tackle this challenge, we propose a novel fidelity-optimization plugin for existing error correction methods via cumulative error minimization, named CEM. CEM predefines the error to characterize the sensitivity of model to acceleration jointly influenced by timesteps and cache intervals. Guided by this prior, we formulate a dynamic programming algorithm with cumulative error approximation for strategy optimization, which achieves the caching error minimization, resulting in a substantial improvement in generation fidelity. CEM is model-agnostic and exhibits strong generalization, which is adaptable to arbitrary acceleration budgets. It can be seamlessly integrated into existing error correction frameworks and quantized models without introducing any additional computational overhead. Extensive experiments conducted on nine generation models and quantized methods across three tasks demonstrate that CEM significantly improves generation fidelity of existing acceleration models, and outperforms the original generation performance on FLUX.1-dev, PixArt-$α$, StableDiffusion1.5 and Hunyuan. The code will be made publicly available.
Problem

Research questions and friction points this paper is trying to address.

Optimizes caching strategies to minimize cumulative error in Diffusion Transformers
Enhances generation fidelity without extra computational overhead
Adapts to varying acceleration budgets across diverse models and tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic programming algorithm optimizes caching error minimization
Model-agnostic plugin integrates seamlessly into existing correction frameworks
Predefined error characterization guides strategy for acceleration adaptation