When to Commit? Towards Variable-Size Self-Contained Blocks for Discrete Diffusion Language Models

📅 2026-04-26
📈 Citations: 0
Influential: 0
📄 PDF

career value

195K/year
🤖 AI Summary
Discrete diffusion language models often rely on fixed or heuristic chunking strategies during inference, which can lead to premature token commitment due to the absence of future context, thereby causing a mismatch between training and inference. This work introduces the notion of “self-containment” and formulates chunk boundary selection as a predictive consistency verification problem. Specifically, it dynamically identifies variable-sized self-contained chunks by measuring the KL divergence between token-level predictive distributions under future-aware (FA) and no-future (NF) conditions. Experimental results demonstrate that this approach substantially outperforms existing fixed or heuristic chunking strategies, effectively reducing premature commitment errors while preserving generation quality.

Technology Category

Application Category

📝 Abstract
Discrete diffusion language models (dLLMs) enable parallel token updates with bidirectional attention, yet practical generation typically adopts blockwise semi-autoregressive decoding. This switch creates a training-inference mismatch: training denoises with full-sequence context, while inference commits tokens within a bounded block without future context. Therefore, decoding with fixed-size or heuristic-based blocks can lead to premature token commitments, as decisions are made without full access to future context that could alter those choices. Motivated by this, we propose self-containedness as a principled criterion for block commitment. A block is self-contained if its predictions remain consistent with Future-Aware (FA) or without No-Future (NF) access to future context, reframing block boundary selection as a test of self-containedness rather than a heuristic choice. Based on this principle, we introduce Variable-size Self-contained Blocks (VSB) for dLLMs. VSB scores and selects block boundaries using the divergence between token-level predictive distributions under NF and FA conditioning, which quantifies how predictions would change if future context were revealed. We provide theoretical justification linking self-containedness to predictive consistency, and extensive experiments validate VSB's efficacy over fixed-size and heuristic blockwise decoding.
Problem

Research questions and friction points this paper is trying to address.

discrete diffusion language models
training-inference mismatch
blockwise decoding
token commitment
future context
Innovation

Methods, ideas, or system contributions that make the work stand out.

discrete diffusion language models
self-contained blocks
variable-size decoding
future-aware conditioning
predictive consistency
🔎 Similar Papers
No similar papers found.