Deferred Commitment Decoding for Diffusion Language Models

📅 2026-01-05

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This work addresses the boundary-induced context truncation (BICT) problem in block-wise decoding of diffusion language models, where tokens near block boundaries are prematurely committed due to insufficient future context, degrading generation quality. To mitigate this without additional training, the authors propose a training-free Deferred Commitment Decoding (DCD) strategy that dynamically schedules token commitment via a confidence-aware sliding window: tokens with low uncertainty are emitted early, while those with high uncertainty are delayed until sufficient contextual information becomes available, thereby enabling efficient bidirectional information flow within the window. DCD introduces, for the first time, an uncertainty-based dynamic token commitment mechanism that remains compatible with KV caching and preserves inference efficiency. Experiments across multiple benchmarks and models demonstrate consistent improvements, with an average gain of 1.39% in generation accuracy and up to 9.0% in peak performance.

Technology Category

Application Category

📝 Abstract

Diffusion language models (DLMs) have recently emerged as a strong alternative to autoregressive models by enabling parallel text generation. To improve inference efficiency and KV-cache compatibility, prior work commonly adopts block-based diffusion, decoding tokens block by block. However, this paradigm suffers from a structural limitation that we term Boundary-Induced Context Truncation (BICT): undecoded tokens near block boundaries are forced to commit without access to nearby future context, even when such context could substantially reduce uncertainty. This limitation degrades decoding certainty and generation quality, especially for tasks requiring precise reasoning, such as mathematical problem solving and code generation. We propose Deferred Commitment Decoding (DCD), a novel, training-free decoding strategy that mitigates this issue. DCD maintains a certainty-aware sliding window over masked tokens, resolving low-uncertainty tokens early while deferring high-uncertainty tokens until sufficient contextual evidence becomes available. Extensive experiments across multiple diffusion language models, benchmarks, and caching configurations show that DCD improves generation accuracy by 1.73% with comparable time on average compared to fixed block-based diffusion methods, with the most significant improvement reaching 16.5%. These results demonstrate that deferring token commitment based on uncertainty is a simple yet effective principle for improving both the quality and efficiency of diffusion language model decoding.

Problem

Research questions and friction points this paper is trying to address.

Diffusion Language Models

Boundary-Induced Context Truncation

Block-based Decoding

Decoding Confidence

Contextual Uncertainty

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deferred Commitment Decoding

Diffusion Language Models

Confidence-Aware Sliding Window