🤖 AI Summary
To address insufficient sample diversity, sampling collapse, and restricted prediction space in bit-level autoregressive image generation, this paper proposes DiverseAR. The method introduces two key innovations: (1) an adaptive logits scaling mechanism that dynamically modulates output confidence per bit position; and (2) an energy-function-guided sequence search algorithm that explicitly models and optimizes the global sequence energy. Built upon a bit-level visual tokenizer and smoothed binary classification probabilities, DiverseAR requires no additional parameters or post-processing. Experiments demonstrate that DiverseAR significantly outperforms existing bit-level autoregressive models across diversity metrics—including FID, LPIPS, and entropy—while maintaining or even improving fidelity, as measured by Inception Score. To our knowledge, DiverseAR is the first bit-level autoregressive approach to jointly enhance diversity and fidelity without compromising generation quality.
📝 Abstract
In this paper, we investigate the underexplored challenge of sample diversity in autoregressive (AR) generative models with bitwise visual tokenizers. We first analyze the factors that limit diversity in bitwise AR models and identify two key issues: (1) the binary classification nature of bitwise modeling, which restricts the prediction space, and (2) the overly sharp logits distribution, which causes sampling collapse and reduces diversity. Building on these insights, we propose DiverseAR, a principled and effective method that enhances image diversity without sacrificing visual quality. Specifically, we introduce an adaptive logits distribution scaling mechanism that dynamically adjusts the sharpness of the binary output distribution during sampling, resulting in smoother predictions and greater diversity. To mitigate potential fidelity loss caused by distribution smoothing, we further develop an energy-based generation path search algorithm that avoids sampling low-confidence tokens, thereby preserving high visual quality. Extensive experiments demonstrate that DiverseAR substantially improves sample diversity in bitwise autoregressive image generation.