🤖 AI Summary
This work proposes Order-Token Search, a novel decoding method for diffusion language models that jointly explores generation order and token values—addressing the limitation of conventional approaches that follow a single deterministic trajectory. By leveraging a likelihood estimator grounded in the denoising process, the method evaluates and prunes multiple generation paths to efficiently identify high-potential sequences. This joint search strategy significantly enhances both decoding diversity and task performance. Empirical results demonstrate consistent improvements across multiple benchmarks: absolute gains of 3.1% on GSM8K, 3.8% on MATH500, 7.9% on Countdown, and 6.8% on HumanEval, matching or surpassing the performance of models trained with diffu-GRPO.
📝 Abstract
Diffusion Language Models (DLMs) offer order-agnostic generation that can explore many possible decoding trajectories. However, current decoding methods commit to a single trajectory, limiting exploration in trajectory space. We introduce Order-Token Search to explore this space through jointly searching over generation order and token values. Its core is a likelihood estimator that scores denoising actions, enabling stable pruning and efficient exploration of diverse trajectories. Across mathematical reasoning and coding benchmarks, Order-Token Search consistently outperforms baselines on GSM8K, MATH500, Countdown, and HumanEval (3.1%, 3.8%, 7.9%, and 6.8% absolute over backbone), matching or surpassing diffu-GRPO post-trained d1-LLaDA. Our work establishes joint search as a key component for advancing decoding in DLMs.