Drift-AR: Single-Step Visual Autoregressive Generation via Anti-Symmetric Drifting

📅 2026-03-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing hybrid generative models combining autoregressive and diffusion paradigms suffer from low generation efficiency due to sequential autoregressive modeling and multi-step visual decoding. This work proposes a unified acceleration framework grounded in predictive entropy: for the first time, position-level predictive entropy in continuous space is jointly leveraged to optimize draft generation quality and control single-step visual decoding. The framework introduces entropy-aware speculative decoding, a causally normalized entropy loss, and an antisymmetric drift field mechanism, achieving joint acceleration without additional computational overhead. Experiments on MAR, TransDiff, and NextStep-1 demonstrate speedups of 3.8–5.5× while maintaining or even surpassing the generation quality of original multi-step methods under single-step (1-NFE) decoding.
📝 Abstract
Autoregressive (AR)-Diffusion hybrid paradigms combine AR's structured semantic modeling with diffusion's high-fidelity synthesis, yet suffer from a dual speed bottleneck: the sequential AR stage and the iterative multi-step denoising of the diffusion vision decode stage. Existing methods address each in isolation without a unified principle design. We observe that the per-position \emph{prediction entropy} of continuous-space AR models naturally encodes spatially varying generation uncertainty, which simultaneously governing draft prediction quality in the AR stage and reflecting the corrective effort required by vision decoding stage, which is not fully explored before. Since entropy is inherently tied to both bottlenecks, it serves as a natural unifying signal for joint acceleration. In this work, we propose \textbf{Drift-AR}, which leverages entropy signal to accelerate both stages: 1) for AR acceleration, we introduce Entropy-Informed Speculative Decoding that align draft--target entropy distributions via a causal-normalized entropy loss, resolving the entropy mismatch that causes excessive draft rejection; 2) for visual decoder acceleration, we reinterpret entropy as the \emph{physical variance} of the initial state for an anti-symmetric drifting field -- high-entropy positions activate stronger drift toward the data manifold while low-entropy positions yield vanishing drift -- enabling single-step (1-NFE) decoding without iterative denoising or distillation. Moreover, both stages share the same entropy signal, which is computed once with no extra cost. Experiments on MAR, TransDiff, and NextStep-1 demonstrate 3.8--5.5$\times$ speedup with genuine 1-NFE decoding, matching or surpassing original quality. Code will be available at https://github.com/aSleepyTree/Drift-AR.
Problem

Research questions and friction points this paper is trying to address.

autoregressive generation
diffusion models
speed bottleneck
visual decoding
generation acceleration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Autoregressive Generation
Diffusion Models
Prediction Entropy
Anti-Symmetric Drifting
Single-Step Decoding
🔎 Similar Papers
No similar papers found.