🤖 AI Summary
Diffusion Transformers (DiT) require numerous expensive Transformer forward passes during inference, and existing training-free acceleration methods rely on fixed schedules or handcrafted thresholds, struggling to adaptively balance computational cost and generation quality. This work proposes SoftCap, a training-free, cache-based inference control layer that introduces, for the first time, a soft budget mechanism—treating the computation budget as a flexible upper bound rather than a rigid constraint. SoftCap employs a trajectory drift observer to assess cache risk and dynamically adjusts the threshold for triggering full computations via lightweight statistics and a PI feedback controller. Evaluated on FLUX.1-dev, SoftCap achieves superior performance over SpeCa at comparable FLOPs, improving ImageReward from 0.967 to 0.981 and reducing LPIPS-Full from 0.518 to 0.498, demonstrating its effectiveness and adaptability.
📝 Abstract
Diffusion Transformers (DiTs) achieve strong visual quality, but their iterative denoising process requires many costly Transformer evaluations. Training-free acceleration methods reduce this cost by caching, forecasting, or verifying intermediate features, yet the runtime decision of when to execute a Full step is often driven by fixed schedules or hand-tuned thresholds. We propose \textbf{SoftCap}, a training-free control layer for cache-based DiT inference. SoftCap couples a Trajectory Drift Observer, which estimates local cache risk from lightweight hidden-state statistics, with a Soft-Budget PI Controller, which adjusts the Full-triggering threshold from realized compute relative to a fixed reference profile. The budget is a soft ceiling: it shapes the threshold but does not require a run to spend a prescribed number of Full evaluations. On FLUX.1-dev, SoftCap improves over SpeCa at a comparable middle-compute operating point, raising ImageReward from 0.967 to 0.981 and reducing LPIPS-Full from 0.518 to 0.498 at nearly identical FLOPs, while target-sweep diagnostics show the intended soft-ceiling behavior as the budget is relaxed.