TAP: A Token-Adaptive Predictor Framework for Training-Free Diffusion Acceleration

📅 2026-03-04

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

Diffusion model inference is hindered by the computational cost of multi-step full denoising iterations. This work proposes a training-free, probe-driven framework that dynamically selects the optimal predictor for each token at every sampling step. By leveraging a single forward pass through the model’s initial layer as a low-cost probe, the method evaluates candidate predictors via a proxy loss and enables token-level adaptive scheduling. This approach is the first to enable dynamic, per-token predictor selection and is compatible with diverse predictor designs—including Taylor expansion–based multi-order, multi-step predictors. It consistently outperforms fixed global predictors and cache-only baselines across various diffusion architectures and generation tasks, achieving substantial speedups with negligible degradation in output quality.

Technology Category

Application Category

📝 Abstract

Diffusion models achieve strong generative performance but remain slow at inference due to the need for repeated full-model denoising passes. We present Token-Adaptive Predictor (TAP), a training-free, probe-driven framework that adaptively selects a predictor for each token at every sampling step. TAP uses a single full evaluation of the model's first layer as a low-cost probe to compute proxy losses for a compact family of candidate predictors (instantiated primarily with Taylor expansions of varying order and horizon), then assigns each token the predictor with the smallest proxy error. This per-token "probe-then-select" strategy exploits heterogeneous temporal dynamics, requires no additional training, and is compatible with various predictor designs. TAP incurs negligible overhead while enabling large speedups with little or no perceptual quality loss. Extensive experiments across multiple diffusion architectures and generation tasks show that TAP substantially improves the accuracy-efficiency frontier compared to fixed global predictors and caching-only baselines.

Problem

Research questions and friction points this paper is trying to address.

diffusion models

inference acceleration

token-adaptive prediction

training-free acceleration

generative modeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free

token-adaptive

diffusion acceleration