Elastic Diffusion Transformer

📅 2026-02-15

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work proposes an elastic Diffusion Transformer framework to address the high computational cost of diffusion models, which existing acceleration methods struggle to balance with generation quality due to their fixed computational budgets. The key innovation is a sample-adaptive dynamic sparsity mechanism that, during inference, employs a lightweight router to dynamically skip redundant modules and adjust MLP width on-the-fly. Additionally, block-level feature caching is integrated to minimize redundant computations. Evaluated on both 2D image and 3D asset generation tasks, the proposed method achieves approximately 2× speedup while preserving near-original generation quality.

Technology Category

Application Category

📝 Abstract

Diffusion Transformers (DiT) have demonstrated remarkable generative capabilities but remain highly computationally expensive. Previous acceleration methods, such as pruning and distillation, typically rely on a fixed computational capacity, leading to insufficient acceleration and degraded generation quality. To address this limitation, we propose \textbf{Elastic Diffusion Transformer (E-DiT)}, an adaptive acceleration framework for DiT that effectively improves efficiency while maintaining generation quality. Specifically, we observe that the generative process of DiT exhibits substantial sparsity (i.e., some computations can be skipped with minimal impact on quality), and this sparsity varies significantly across samples. Motivated by this observation, E-DiT equips each DiT block with a lightweight router that dynamically identifies sample-dependent sparsity from the input latent. Each router adaptively determines whether the corresponding block can be skipped. If the block is not skipped, the router then predicts the optimal MLP width reduction ratio within the block. During inference, we further introduce a block-level feature caching mechanism that leverages router predictions to eliminate redundant computations in a training-free manner. Extensive experiments across 2D image (Qwen-Image and FLUX) and 3D asset (Hunyuan3D-3.0) demonstrate the effectiveness of E-DiT, achieving up to $\sim$2$\times$ speedup with negligible loss in generation quality. Code will be available at https://github.com/wangjiangshan0725/Elastic-DiT.

Problem

Research questions and friction points this paper is trying to address.

Diffusion Transformers

computational efficiency

generation quality

acceleration

sparsity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion Transformer

adaptive acceleration

dynamic sparsity