Towards Better & Faster Autoregressive Image Generation: From the Perspective of Entropy

📅 2025-10-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Autoregressive image generation suffers from low information density and spatially uneven distribution of image tokens, limiting both generation quality and decoding speed. To address this, we propose an entropy-driven efficient decoding framework. Our key contributions are: (1) a spatial entropy-guided dynamic temperature scheduling mechanism that balances token diversity and structural consistency; (2) an entropy-aware acceptance criterion for speculative decoding, significantly improving token acceptance reliability; and (3) a lightweight design compatible with both mask-based and scale-wise autoregressive architectures. Extensive experiments across multiple benchmarks and model variants demonstrate that our method preserves near-lossless generation quality while reducing inference cost to 85% of conventional acceleration approaches—outperforming existing decoding strategies in both efficiency and fidelity.

Technology Category

Application Category

📝 Abstract
In this work, we first revisit the sampling issues in current autoregressive (AR) image generation models and identify that image tokens, unlike text tokens, exhibit lower information density and non-uniform spatial distribution. Accordingly, we present an entropy-informed decoding strategy that facilitates higher autoregressive generation quality with faster synthesis speed. Specifically, the proposed method introduces two main innovations: 1) dynamic temperature control guided by spatial entropy of token distributions, enhancing the balance between content diversity, alignment accuracy, and structural coherence in both mask-based and scale-wise models, without extra computational overhead, and 2) entropy-aware acceptance rules in speculative decoding, achieving near-lossless generation at about 85% of the inference cost of conventional acceleration methods. Extensive experiments across multiple benchmarks using diverse AR image generation models demonstrate the effectiveness and generalizability of our approach in enhancing both generation quality and sampling speed.
Problem

Research questions and friction points this paper is trying to address.

Improving autoregressive image generation quality and speed
Addressing low information density in image tokens
Enhancing sampling efficiency with entropy-guided decoding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic temperature control using spatial entropy guidance
Entropy-aware acceptance rules for speculative decoding
Balancing diversity and coherence without extra computation
🔎 Similar Papers
No similar papers found.
Xiaoxiao Ma
Xiaoxiao Ma
Oracle, Macquarie University
LLMdeep generative modelsanomaly detectiongraph neural networks
F
Feng Zhao
University of Science and Technology of China
Pengyang Ling
Pengyang Ling
University of Science and Technology of China
image restorationimage editingimage generationcontrollable video generation
Haibo Qiu
Haibo Qiu
University of Sydney
Multimodal LLMVision and LanguageComputer Vision
Z
Zhixiang Wei
University of Science and Technology of China
H
Hu Yu
University of Science and Technology of China
J
Jie Huang
University of Science and Technology of China
Z
Zhixiong Zeng
Meituan
L
Lin Ma
Meituan