🤖 AI Summary
Discrete diffusion models lack efficient test-time scaling mechanisms. To address this, we propose IterRef—a reward-guided test-time optimization method that requires no exhaustive search and makes no assumptions about initial alignment of intermediate latent states. At its core, IterRef refines intermediate latent representations in situ and progressively via alternating noise-addition and denoising steps within the Multiple-Try Metropolis (MTM) framework. By integrating reward signals directly into the discrete diffusion sampling process, IterRef is the first to combine MTM with discrete diffusion for test-time scaling. Experiments on text and image generation demonstrate that IterRef significantly improves generation quality, substantially outperforming state-of-the-art methods—especially under low computational budgets. This establishes a new paradigm for high-quality generation in resource-constrained settings.
📝 Abstract
Test-time scaling through reward-guided generation remains largely unexplored for discrete diffusion models despite its potential as a promising alternative. In this work, we introduce Iterative Reward-Guided Refinement (IterRef), a novel test-time scaling method tailored to discrete diffusion that leverages reward- guided noising-denoising transitions to progressively refine misaligned intermediate states. We formalize this process within a Multiple-Try Metropolis (MTM) framework, proving convergence to the reward-aligned distribution. Unlike prior methods that assume the current state is already aligned with the reward distribution and only guide the subsequent transition, our approach explicitly refines each state in situ, progressively steering it toward the optimal intermediate distribution. Across both text and image domains, we evaluate IterRef on diverse discrete diffusion models and observe consistent improvements in reward-guided generation quality. In particular, IterRef achieves striking gains under low compute budgets, far surpassing prior state-of-the-art baselines.