π€ AI Summary
In next-token-prediction (NTP)-based image generation, intermediate sequence rewards exhibit weak correlation with final image quality, hindering effective test-time scaling for pruning. To address this, we propose the Fill-in Reward (FR) mechanism: a pretrained reward model estimates the quality of the *complete* token sequence obtainable by βfilling inβ missing tokens from an intermediate prefix, thereby significantly improving alignment between intermediate states and final-image rewards. We further introduce a dynamically weighted diversity reward to jointly optimize fidelity and diversity. Our method integrates token confidence analysis, sequence filling strategies, and adaptive reward weighting. Extensive experiments across multiple benchmarks and reward models demonstrate consistent and substantial improvements over state-of-the-art NTP image generation approaches, with marked gains in generation quality. The code is publicly available.
π Abstract
Test-time scaling (TTS) has become a prevalent technique in image generation, significantly boosting output quality by expanding the number of parallel samples and filtering them using pre-trained reward models. However, applying this powerful methodology to the next-token prediction (NTP) paradigm remains challenging. The primary obstacle is the low correlation between the reward of an image decoded from an intermediate token sequence and the reward of the fully generated image. Consequently, these incomplete intermediate representations prove to be poor indicators for guiding the pruning direction, a limitation that stems from their inherent incompleteness in scale or semantic content. To effectively address this critical issue, we introduce the Filling-Based Reward (FR). This novel design estimates the approximate future trajectory of an intermediate sample by finding and applying a reasonable filling scheme to complete the sequence. Both the correlation coefficient between rewards of intermediate samples and final samples, as well as multiple intrinsic signals like token confidence, indicate that the FR provides an excellent and reliable metric for accurately evaluating the quality of intermediate samples. Building upon this foundation, we propose FR-TTS, a sophisticated scaling strategy. FR-TTS efficiently searches for good filling schemes and incorporates a diversity reward with a dynamic weighting schedule to achieve a balanced and comprehensive evaluation of intermediate samples. We experimentally validate the superiority of FR-TTS over multiple established benchmarks and various reward models. Code is available at href{https://github.com/xuhang07/FR-TTS}{https://github.com/xuhang07/FR-TTS}.