🤖 AI Summary
Autoregressive image generation suffers from low information density and spatially uneven distribution of image tokens, limiting both generation quality and decoding speed. To address this, we propose an entropy-driven efficient decoding framework. Our key contributions are: (1) a spatial entropy-guided dynamic temperature scheduling mechanism that balances token diversity and structural consistency; (2) an entropy-aware acceptance criterion for speculative decoding, significantly improving token acceptance reliability; and (3) a lightweight design compatible with both mask-based and scale-wise autoregressive architectures. Extensive experiments across multiple benchmarks and model variants demonstrate that our method preserves near-lossless generation quality while reducing inference cost to 85% of conventional acceleration approaches—outperforming existing decoding strategies in both efficiency and fidelity.
📝 Abstract
In this work, we first revisit the sampling issues in current autoregressive (AR) image generation models and identify that image tokens, unlike text tokens, exhibit lower information density and non-uniform spatial distribution. Accordingly, we present an entropy-informed decoding strategy that facilitates higher autoregressive generation quality with faster synthesis speed. Specifically, the proposed method introduces two main innovations: 1) dynamic temperature control guided by spatial entropy of token distributions, enhancing the balance between content diversity, alignment accuracy, and structural coherence in both mask-based and scale-wise models, without extra computational overhead, and 2) entropy-aware acceptance rules in speculative decoding, achieving near-lossless generation at about 85% of the inference cost of conventional acceleration methods. Extensive experiments across multiple benchmarks using diverse AR image generation models demonstrate the effectiveness and generalizability of our approach in enhancing both generation quality and sampling speed.