Lookahead Unmasking Elicits Accurate Decoding in Diffusion Language Models

📅 2025-11-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Masked Diffusion Models (MDMs) suffer from sequential decoding’s strong dependence on ordering heuristics—e.g., confidence-based sampling—which exhibit myopia, error accumulation, and inability to leverage test-time compute. This work proposes a lookahead decoding framework that requires no external reward model: it reformulates decoding as a path-selection problem guided by sequence-level uncertainty calibration; introduces a differentiable verifier for importance sampling; and couples a path generator–verifier architecture with ensemble-level importance sampling. Crucially, only 2–3 parallel decoding paths suffice for efficient inference. Evaluated across six benchmarks spanning mathematical reasoning, planning, and code generation, the method significantly outperforms existing baselines. When integrated into LLaDA, it matches the performance of RL-tuned LLaDA 1.5—and further surpasses it—demonstrating both effectiveness and scalability.

Technology Category

Application Category

📝 Abstract
Masked Diffusion Models (MDMs) as language models generate by iteratively unmasking tokens, yet their performance crucially depends on the inference time order of unmasking. Prevailing heuristics, such as confidence based sampling, are myopic: they optimize locally, fail to leverage extra test-time compute, and let early decoding mistakes cascade. We propose Lookahead Unmasking (LookUM), which addresses these concerns by reformulating sampling as path selection over all possible unmasking orders without the need for an external reward model. Our framework couples (i) a path generator that proposes paths by sampling from pools of unmasking sets with (ii) a verifier that computes the uncertainty of the proposed paths and performs importance sampling to subsequently select the final paths. Empirically, erroneous unmasking measurably inflates sequence level uncertainty, and our method exploits this to avoid error-prone trajectories. We validate our framework across six benchmarks, such as mathematics, planning, and coding, and demonstrate consistent performance improvements. LookUM requires only two to three paths to achieve peak performance, demonstrating remarkably efficient path selection. The consistent improvements on both LLaDA and post-trained LLaDA 1.5 are particularly striking: base LLaDA with LookUM rivals the performance of RL-tuned LLaDA 1.5, while LookUM further enhances LLaDA 1.5 itself showing that uncertainty based verification provides orthogonal benefits to reinforcement learning and underscoring the versatility of our framework. Code will be publicly released.
Problem

Research questions and friction points this paper is trying to address.

Improves unmasking order selection in diffusion language models
Reduces error cascades by evaluating multiple decoding paths
Enhances performance without requiring external reward models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes path selection over unmasking orders
Uses generator-verifier framework for uncertainty evaluation
Achieves efficient decoding with minimal path sampling
🔎 Similar Papers
No similar papers found.