Beyond Confidence: Adaptive and Coherent Decoding for Diffusion Language Models

📅 2025-11-26

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

Existing diffusion language models (DLMs) rely on local, step-wise metrics—such as token-level confidence—for decoding, lacking global consistency modeling, which leads to incoherent generation and unstable sampling trajectories. To address this, we propose the first history-consistency modeling framework grounded in conditional mutual information (CMI), integrating a trajectory correction mechanism and an adaptive unmasking scheduling algorithm for dynamic allocation of decoding budget. Our key contributions are: (i) the first application of CMI to quantify how historical context constrains current token sampling, thereby enhancing semantic coherence; and (ii) consistency-aware dynamic sampling that significantly reduces redundant computation. Evaluated on the Dream and LLaDA benchmarks, our method achieves up to 3.48× inference speedup and a 3.91% improvement in generation quality, demonstrating superior efficiency and robustness.

Technology Category

Application Category

📝 Abstract

Diffusion Language Models (DLMs) have recently achieved significant success due to their any-order generation capabilities. However, existing inference methods typically rely on local, immediate-step metrics such as confidence or entropy which inherently lack a more reliable perspective. This limitation frequently leads to inconsistent sampling trajectories and suboptimal generation quality. To address this, we propose Coherent Contextual Decoding (CCD), a novel inference framework built upon two core innovations. First, CCD employs a trajectory rectification mechanism that leverages historical context to enhance sequence coherence, enabling the early rejection of suboptimal paths. We demonstrate that this mechanism is theoretically equivalent to modeling the consistency of historical steps via the conditional mutual information between context and token predictions. Building on this theoretical insight, we further address the inefficiency of conventional uniform decoding budgets. Instead of rigid allocations based on diffusion steps, we introduce an adaptive sampling strategy that dynamically adjusts the unmasking budget for each step according to our consistency metric. Consequently, our method significantly improves the quality of generation trajectories while accelerating the sampling process. Empirically, our method achieves a simultaneous enhancement in both inference speed and performance across diverse benchmarks on Dream and LLaDA, delivering up to 3.48x speedup alongside 3.91% performance improvement.

Problem

Research questions and friction points this paper is trying to address.

Improves DLM generation coherence via historical context rectification

Replaces uniform decoding with adaptive sampling for efficiency

Enhances both inference speed and performance across benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Trajectory rectification using historical context for coherence

Adaptive sampling strategy adjusting unmasking budget dynamically

Conditional mutual information modeling for consistency in decoding

🔎 Similar Papers

No similar papers found.