Effective Test-Time Scaling of Discrete Diffusion through Iterative Refinement

📅 2025-11-04

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Discrete diffusion models lack efficient test-time scaling mechanisms. To address this, we propose IterRef—a reward-guided test-time optimization method that requires no exhaustive search and makes no assumptions about initial alignment of intermediate latent states. At its core, IterRef refines intermediate latent representations in situ and progressively via alternating noise-addition and denoising steps within the Multiple-Try Metropolis (MTM) framework. By integrating reward signals directly into the discrete diffusion sampling process, IterRef is the first to combine MTM with discrete diffusion for test-time scaling. Experiments on text and image generation demonstrate that IterRef significantly improves generation quality, substantially outperforming state-of-the-art methods—especially under low computational budgets. This establishes a new paradigm for high-quality generation in resource-constrained settings.

Technology Category

Application Category

📝 Abstract

Test-time scaling through reward-guided generation remains largely unexplored for discrete diffusion models despite its potential as a promising alternative. In this work, we introduce Iterative Reward-Guided Refinement (IterRef), a novel test-time scaling method tailored to discrete diffusion that leverages reward- guided noising-denoising transitions to progressively refine misaligned intermediate states. We formalize this process within a Multiple-Try Metropolis (MTM) framework, proving convergence to the reward-aligned distribution. Unlike prior methods that assume the current state is already aligned with the reward distribution and only guide the subsequent transition, our approach explicitly refines each state in situ, progressively steering it toward the optimal intermediate distribution. Across both text and image domains, we evaluate IterRef on diverse discrete diffusion models and observe consistent improvements in reward-guided generation quality. In particular, IterRef achieves striking gains under low compute budgets, far surpassing prior state-of-the-art baselines.

Problem

Research questions and friction points this paper is trying to address.

Refining misaligned states in discrete diffusion models through reward guidance

Enhancing test-time scaling via iterative noising-denoising transitions

Improving reward-aligned generation quality across text and image domains

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reward-guided noising-denoising transitions refine states

Multiple-Try Metropolis framework ensures convergence

In-situ state refinement optimizes intermediate distributions

🔎 Similar Papers

No similar papers found.