🤖 AI Summary
Discrete Diffusion Language Models (DDLMS) exhibit competitive performance with autoregressive models under training-scale scaling, yet inference-time scaling remains underexplored. This paper addresses reward-guided high-quality text generation and proposes the first Particle Gibbs sampling framework tailored for DDLMS, enabling joint iterative optimization over multiple trajectories. Our method employs conditional sequential Monte Carlo as the transition kernel and refines trajectories via a reward-weighted target distribution. We further conduct the first systematic analysis of the computational–performance trade-offs across four inference-time scaling dimensions: number of trajectories, diffusion steps, resampling rate, and parallelism. Experiments demonstrate that our approach consistently outperforms existing inference strategies across diverse compute budgets, achieving significant improvements in accuracy and generation quality on multiple reward-guided tasks.
📝 Abstract
Discrete diffusion models have emerged as a powerful paradigm for language modeling, rivaling auto-regressive models by training-time scaling. However, inference-time scaling in discrete diffusion models remains relatively under-explored. In this work, we study sampling-based approaches for achieving high-quality text generation from discrete diffusion models in reward-guided settings. We introduce a novel inference-time scaling approach based on particle Gibbs sampling for discrete diffusion models. The particle Gibbs sampling algorithm iteratively refines full diffusion trajectories using conditional Sequential Monte Carlo as its transition mechanism. This process ensures that the updated samples progressively improve and move closer to the reward-weighted target distribution. Unlike existing inference-time scaling methods, which are often limited to single diffusion trajectories, our approach leverages iterative refinement across multiple trajectories. Within this framework, we further analyze the trade-offs between four key axes for inference-time scaling under fixed compute budgets: particle Gibbs iterations, particle count, denoising steps, and reward estimation cost. Empirically, our method consistently outperforms prior inference-time strategies on reward-guided text generation tasks, achieving significant improvement in accuracy under varying compute budgets.