FR-TTS: Test-Time Scaling for NTP-based Image Generation with Effective Filling-based Reward Signal

πŸ“… 2025-11-29
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
In next-token-prediction (NTP)-based image generation, intermediate sequence rewards exhibit weak correlation with final image quality, hindering effective test-time scaling for pruning. To address this, we propose the Fill-in Reward (FR) mechanism: a pretrained reward model estimates the quality of the *complete* token sequence obtainable by β€œfilling in” missing tokens from an intermediate prefix, thereby significantly improving alignment between intermediate states and final-image rewards. We further introduce a dynamically weighted diversity reward to jointly optimize fidelity and diversity. Our method integrates token confidence analysis, sequence filling strategies, and adaptive reward weighting. Extensive experiments across multiple benchmarks and reward models demonstrate consistent and substantial improvements over state-of-the-art NTP image generation approaches, with marked gains in generation quality. The code is publicly available.

Technology Category

Application Category

πŸ“ Abstract
Test-time scaling (TTS) has become a prevalent technique in image generation, significantly boosting output quality by expanding the number of parallel samples and filtering them using pre-trained reward models. However, applying this powerful methodology to the next-token prediction (NTP) paradigm remains challenging. The primary obstacle is the low correlation between the reward of an image decoded from an intermediate token sequence and the reward of the fully generated image. Consequently, these incomplete intermediate representations prove to be poor indicators for guiding the pruning direction, a limitation that stems from their inherent incompleteness in scale or semantic content. To effectively address this critical issue, we introduce the Filling-Based Reward (FR). This novel design estimates the approximate future trajectory of an intermediate sample by finding and applying a reasonable filling scheme to complete the sequence. Both the correlation coefficient between rewards of intermediate samples and final samples, as well as multiple intrinsic signals like token confidence, indicate that the FR provides an excellent and reliable metric for accurately evaluating the quality of intermediate samples. Building upon this foundation, we propose FR-TTS, a sophisticated scaling strategy. FR-TTS efficiently searches for good filling schemes and incorporates a diversity reward with a dynamic weighting schedule to achieve a balanced and comprehensive evaluation of intermediate samples. We experimentally validate the superiority of FR-TTS over multiple established benchmarks and various reward models. Code is available at href{https://github.com/xuhang07/FR-TTS}{https://github.com/xuhang07/FR-TTS}.
Problem

Research questions and friction points this paper is trying to address.

Improving test-time scaling for next-token prediction image generation
Addressing low correlation between intermediate and final image rewards
Providing reliable quality evaluation for incomplete token sequences
Innovation

Methods, ideas, or system contributions that make the work stand out.

FR-TTS uses filling-based reward to estimate future image quality
It searches for good filling schemes to complete intermediate sequences
Incorporates diversity reward with dynamic weighting for balanced evaluation
πŸ”Ž Similar Papers
No similar papers found.
H
Hang Xu
MoE Key Lab of BIPC, USTC
Linjiang Huang
Linjiang Huang
BUAA<<CUHK<<CASIA
Computer VisionPattern RecognitionMachine Learning
F
Feng Zhao
MoE Key Lab of BIPC, USTC