π€ AI Summary
Diffusion language models often employ greedy decoding strategies that select high-confidence positions for demasking, which can lead to suboptimal generation orders and limited performance on complex reasoning tasks. This work proposes SOAR, a training-free decoding algorithm that introduces, for the first time in diffusion language models, a confidence-driven dynamic search mechanism: it expands the search space under low-confidence conditions to avoid premature commitments and enables parallel decoding across multiple positions when confidence is high to accelerate inference. By integrating adaptive beam search with position-level parallel demasking, SOAR significantly improves generation quality on benchmarks such as GSM8K, MBPP, and HumanEval when applied to Dream-7B and LLaDA-8B models, while maintaining efficient inference speed.
π Abstract
Diffusion Language Models (DLMs) generate text by iteratively denoising a masked sequence, repeatedly deciding which positions to commit at each step. Standard decoding follows a greedy rule: unmask the most confident positions, yet this local choice can lock the model into a suboptimal unmasking order, especially on reasoning-heavy prompts. We present SOAR, a training-free decoding algorithm that adapts its behavior to the model's uncertainty. When confidence is low, SOAR briefly widens the search over alternative unmasking decisions to avoid premature commitments; when confidence is high, it collapses the search and decodes many positions in parallel to reduce the number of denoising iterations. Across mathematical reasoning and code generation benchmarks (GSM8K, MBPP, HumanEval) on Dream-7B and LLaDA-8B, SOAR improves generation quality while maintaining competitive inference speed, offering a practical way to balance quality and efficiency in DLM decoding.