Fast-ARDiff: An Entropy-informed Acceleration Framework for Continuous Space Autoregressive Generation

πŸ“… 2025-12-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
AR-diffusion hybrid models suffer from high inference latency due to sequential autoregressive generation and iterative denoising. This paper proposes Fast-ARDiffβ€”the first end-to-end framework unifying optimization of both AR and diffusion processes. Its core contributions are: (1) entropy-aware speculative decoding, mitigating draft-model overconfidence via entropy alignment; (2) a dynamic scheduling mechanism that jointly coordinates AR speculation steps and diffusion denoising steps; and (3) joint trajectory- and distribution-matching knowledge distillation, enhanced by shallow-feature pre-filtering to enable cross-paradigm co-optimization. Evaluated on ImageNet 256Γ—256, Fast-ARDiff achieves a 4.3Γ— lossless speedup; on text generation tasks, it delivers a 3Γ— acceleration. These improvements significantly reduce end-to-end inference latency while preserving generation quality.

Technology Category

Application Category

πŸ“ Abstract
Autoregressive(AR)-diffusion hybrid paradigms combine AR's structured modeling with diffusion's photorealistic synthesis, yet suffer from high latency due to sequential AR generation and iterative denoising. In this work, we tackle this bottleneck and propose a unified AR-diffusion framework Fast-ARDiff that jointly optimizes both components, accelerating AR speculative decoding while simultaneously facilitating faster diffusion decoding. Specifically: (1) The entropy-informed speculative strategy encourages draft model to produce higher-entropy representations aligned with target model's entropy characteristics, mitigating entropy mismatch and high rejection rates caused by draft overconfidence. (2) For diffusion decoding, rather than treating it as an independent module, we integrate it into the same end-to-end framework using a dynamic scheduler that prioritizes AR optimization to guide the diffusion part in further steps. The diffusion part is optimized through a joint distillation framework combining trajectory and distribution matching, ensuring stable training and high-quality synthesis with extremely few steps. During inference, shallow feature entropy from AR module is used to pre-filter low-entropy drafts, avoiding redundant computation and improving latency. Fast-ARDiff achieves state-of-the-art acceleration across diverse models: on ImageNet 256$ imes$256, TransDiff attains 4.3$ imes$ lossless speedup, and NextStep-1 achieves 3$ imes$ acceleration on text-conditioned generation. Code will be available at https://github.com/aSleepyTree/Fast-ARDiff.
Problem

Research questions and friction points this paper is trying to address.

Accelerates autoregressive-diffusion hybrid models by reducing sequential generation latency
Mitigates entropy mismatch between draft and target models to lower rejection rates
Optimizes diffusion decoding with joint distillation for fewer synthesis steps
Innovation

Methods, ideas, or system contributions that make the work stand out.

Entropy-informed speculative decoding reduces rejection rates
Dynamic scheduler integrates AR and diffusion in one framework
Joint distillation enables high-quality synthesis with few steps
πŸ”Ž Similar Papers
No similar papers found.
Z
Zhen Zou
University of Science and Technology of China
Xiaoxiao Ma
Xiaoxiao Ma
Oracle, Macquarie University
LLMdeep generative modelsanomaly detectiongraph neural networks
J
Jie Huang
University of Science and Technology of China, JD Joy future AI
Z
Zichao Yu
University of Science and Technology of China
F
Feng Zhao
University of Science and Technology of China