Just-in-Time: Training-Free Spatial Acceleration for Diffusion Transformers

📅 2026-03-11

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Diffusion Transformers suffer from high computational costs in image generation, and existing acceleration methods predominantly focus on the temporal domain while overlooking spatial redundancy. This work proposes the first training-free, spatial-domain dynamic sparse acceleration framework: it dynamically selects sparse anchor tokens to construct a spatially approximated ordinary differential equation (ODE) and introduces deterministic micro-flows to ensure structural coherence and statistical fidelity of newly generated tokens. By transcending the limitations of conventional temporal-domain acceleration, the method achieves up to 7× inference speedup on the FLUX.1-dev model with negligible degradation in generation quality, substantially outperforming current state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract

Diffusion Transformers have established a new state-of-the-art in image synthesis, but the high computational cost of iterative sampling severely hampers their practical deployment. While existing acceleration methods often focus on the temporal domain, they overlook the substantial spatial redundancy inherent in the generative process, where global structures emerge long before fine-grained details are formed. The uniform computational treatment of all spatial regions represents a critical inefficiency. In this paper, we introduce Just-in-Time (JiT), a novel training-free framework that addresses this challenge by acceleration in the spatial domain. JiT formulates a spatially approximated generative ordinary differential equation (ODE) that drives the full latent state evolution based on computations from a dynamically selected, sparse subset of anchor tokens. To ensure seamless transitions as new tokens are incorporated to expand the dimensions of the latent state, we propose a deterministic micro-flow, a simple and effective finite-time ODE that maintains both structural coherence and statistical correctness. Extensive experiments on the state-of-the-art FLUX.1-dev model demonstrate that JiT achieves up to a 7x speedup with nearly lossless performance, significantly outperforming existing acceleration methods and establishing a new and superior trade-off between inference speed and generation fidelity.

Problem

Research questions and friction points this paper is trying to address.

Diffusion Transformers

spatial redundancy

computational cost

image synthesis

iterative sampling

Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free acceleration

spatial redundancy

diffusion transformers