Fast-ARDiff: An Entropy-informed Acceleration Framework for Continuous Space Autoregressive Generation

📅 2025-12-09

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

AR-diffusion hybrid models suffer from high inference latency due to sequential autoregressive generation and iterative denoising. This paper proposes Fast-ARDiff—the first end-to-end framework unifying optimization of both AR and diffusion processes. Its core contributions are: (1) entropy-aware speculative decoding, mitigating draft-model overconfidence via entropy alignment; (2) a dynamic scheduling mechanism that jointly coordinates AR speculation steps and diffusion denoising steps; and (3) joint trajectory- and distribution-matching knowledge distillation, enhanced by shallow-feature pre-filtering to enable cross-paradigm co-optimization. Evaluated on ImageNet 256×256, Fast-ARDiff achieves a 4.3× lossless speedup; on text generation tasks, it delivers a 3× acceleration. These improvements significantly reduce end-to-end inference latency while preserving generation quality.

Technology Category

Application Category

📝 Abstract

Autoregressive(AR)-diffusion hybrid paradigms combine AR's structured modeling with diffusion's photorealistic synthesis, yet suffer from high latency due to sequential AR generation and iterative denoising. In this work, we tackle this bottleneck and propose a unified AR-diffusion framework Fast-ARDiff that jointly optimizes both components, accelerating AR speculative decoding while simultaneously facilitating faster diffusion decoding. Specifically: (1) The entropy-informed speculative strategy encourages draft model to produce higher-entropy representations aligned with target model's entropy characteristics, mitigating entropy mismatch and high rejection rates caused by draft overconfidence. (2) For diffusion decoding, rather than treating it as an independent module, we integrate it into the same end-to-end framework using a dynamic scheduler that prioritizes AR optimization to guide the diffusion part in further steps. The diffusion part is optimized through a joint distillation framework combining trajectory and distribution matching, ensuring stable training and high-quality synthesis with extremely few steps. During inference, shallow feature entropy from AR module is used to pre-filter low-entropy drafts, avoiding redundant computation and improving latency. Fast-ARDiff achieves state-of-the-art acceleration across diverse models: on ImageNet 256$ imes$256, TransDiff attains 4.3$ imes$ lossless speedup, and NextStep-1 achieves 3$ imes$ acceleration on text-conditioned generation. Code will be available at https://github.com/aSleepyTree/Fast-ARDiff.

Problem

Research questions and friction points this paper is trying to address.

Accelerates autoregressive-diffusion hybrid models by reducing sequential generation latency

Mitigates entropy mismatch between draft and target models to lower rejection rates

Optimizes diffusion decoding with joint distillation for fewer synthesis steps

Innovation

Methods, ideas, or system contributions that make the work stand out.

Entropy-informed speculative decoding reduces rejection rates

Dynamic scheduler integrates AR and diffusion in one framework

Joint distillation enables high-quality synthesis with few steps

🔎 Similar Papers

FutureFill: Fast Generation from Convolutional Sequence Models