FR-TTS: Test-Time Scaling for NTP-based Image Generation with Effective Filling-based Reward Signal

📅 2025-11-29

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

In next-token-prediction (NTP)-based image generation, intermediate sequence rewards exhibit weak correlation with final image quality, hindering effective test-time scaling for pruning. To address this, we propose the Fill-in Reward (FR) mechanism: a pretrained reward model estimates the quality of the *complete* token sequence obtainable by “filling in” missing tokens from an intermediate prefix, thereby significantly improving alignment between intermediate states and final-image rewards. We further introduce a dynamically weighted diversity reward to jointly optimize fidelity and diversity. Our method integrates token confidence analysis, sequence filling strategies, and adaptive reward weighting. Extensive experiments across multiple benchmarks and reward models demonstrate consistent and substantial improvements over state-of-the-art NTP image generation approaches, with marked gains in generation quality. The code is publicly available.

Technology Category

Application Category

📝 Abstract

Test-time scaling (TTS) has become a prevalent technique in image generation, significantly boosting output quality by expanding the number of parallel samples and filtering them using pre-trained reward models. However, applying this powerful methodology to the next-token prediction (NTP) paradigm remains challenging. The primary obstacle is the low correlation between the reward of an image decoded from an intermediate token sequence and the reward of the fully generated image. Consequently, these incomplete intermediate representations prove to be poor indicators for guiding the pruning direction, a limitation that stems from their inherent incompleteness in scale or semantic content. To effectively address this critical issue, we introduce the Filling-Based Reward (FR). This novel design estimates the approximate future trajectory of an intermediate sample by finding and applying a reasonable filling scheme to complete the sequence. Both the correlation coefficient between rewards of intermediate samples and final samples, as well as multiple intrinsic signals like token confidence, indicate that the FR provides an excellent and reliable metric for accurately evaluating the quality of intermediate samples. Building upon this foundation, we propose FR-TTS, a sophisticated scaling strategy. FR-TTS efficiently searches for good filling schemes and incorporates a diversity reward with a dynamic weighting schedule to achieve a balanced and comprehensive evaluation of intermediate samples. We experimentally validate the superiority of FR-TTS over multiple established benchmarks and various reward models. Code is available at href{https://github.com/xuhang07/FR-TTS}{https://github.com/xuhang07/FR-TTS}.

Problem

Research questions and friction points this paper is trying to address.

Improving test-time scaling for next-token prediction image generation

Addressing low correlation between intermediate and final image rewards

Providing reliable quality evaluation for incomplete token sequences

Innovation

Methods, ideas, or system contributions that make the work stand out.

FR-TTS uses filling-based reward to estimate future image quality

It searches for good filling schemes to complete intermediate sequences

Incorporates diversity reward with dynamic weighting for balanced evaluation

🔎 Similar Papers

No similar papers found.