When to Commit? Towards Variable-Size Self-Contained Blocks for Discrete Diffusion Language Models

📅 2026-04-26

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Discrete diffusion language models often rely on fixed or heuristic chunking strategies during inference, which can lead to premature token commitment due to the absence of future context, thereby causing a mismatch between training and inference. This work introduces the notion of “self-containment” and formulates chunk boundary selection as a predictive consistency verification problem. Specifically, it dynamically identifies variable-sized self-contained chunks by measuring the KL divergence between token-level predictive distributions under future-aware (FA) and no-future (NF) conditions. Experimental results demonstrate that this approach substantially outperforms existing fixed or heuristic chunking strategies, effectively reducing premature commitment errors while preserving generation quality.

Technology Category

Application Category

📝 Abstract

Discrete diffusion language models (dLLMs) enable parallel token updates with bidirectional attention, yet practical generation typically adopts blockwise semi-autoregressive decoding. This switch creates a training-inference mismatch: training denoises with full-sequence context, while inference commits tokens within a bounded block without future context. Therefore, decoding with fixed-size or heuristic-based blocks can lead to premature token commitments, as decisions are made without full access to future context that could alter those choices. Motivated by this, we propose self-containedness as a principled criterion for block commitment. A block is self-contained if its predictions remain consistent with Future-Aware (FA) or without No-Future (NF) access to future context, reframing block boundary selection as a test of self-containedness rather than a heuristic choice. Based on this principle, we introduce Variable-size Self-contained Blocks (VSB) for dLLMs. VSB scores and selects block boundaries using the divergence between token-level predictive distributions under NF and FA conditioning, which quantifies how predictions would change if future context were revealed. We provide theoretical justification linking self-containedness to predictive consistency, and extensive experiments validate VSB's efficacy over fixed-size and heuristic blockwise decoding.

Problem

Research questions and friction points this paper is trying to address.

discrete diffusion language models

training-inference mismatch

blockwise decoding

token commitment

future context

Innovation

Methods, ideas, or system contributions that make the work stand out.

discrete diffusion language models

self-contained blocks

variable-size decoding