The Path Matters: Learning a Token-Commitment Policy for Diffusion Language Models

📅 2026-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the token commitment problem in diffusion-based language models for parallel generation—specifically, how to dynamically determine which candidate tokens to retain at each decoding step. The authors formulate this challenge as a learnable trajectory-state policy and introduce TraceLock, a lightweight plug-in controller that operates without fine-tuning the frozen backbone model. TraceLock leverages a self-supervised signal derived from future stability to guide token selection, enabling zero-shot deployment across diverse settings. Evaluated on question answering, mathematical reasoning, and code generation tasks, the method consistently outperforms both heuristic and learning-based baselines, achieving superior quality–step trade-offs and demonstrating robustness across varying context windows, output lengths, and step budgets.
📝 Abstract
Diffusion large language models promise faster generation by refining many token positions in parallel, but this parallelism introduces a hidden control problem: which proposed tokens should be transferred into the partially decoded sequence at each step? We refer to this decision as token commitment. Existing frozen-generator decoders largely rely on hand-designed confidence rules or block-specific acceptance filters. We argue that token commitment can instead be learned as a reusable trace-state policy. We introduce TraceLock, a lightweight plug-in controller that instantiates this policy for a frozen diffusion language model. Since oracle commitment times are unavailable, TraceLock derives self-supervision from future stability: at decoding step t, a proposed token for position i is labeled stable if it matches the final token at position i after the full decoding trace completes. The controller scores variable-length trace states and decides which active token proposals should be committed to the partially decoded sequence. Once trained for a given frozen backbone, the controller can be deployed across local-window widths, generation lengths, and step budgets without retraining or per-setting calibration. Experiments on question answering, mathematical reasoning, and code generation show that TraceLock improves the quality-step tradeoff over heuristic and learned baselines, with particularly stable behavior under cross-setting deployment. Diagnostic analyses show that its decisions are not reducible to scalar confidence, suggesting that frozen diffusion language models expose a learnable space of commitment trajectories beyond confidence-based decoding. Code is available at https://github.com/BobSun98/TraceLock.
Problem

Research questions and friction points this paper is trying to address.

token commitment
diffusion language models
parallel generation
decoding control
sequence generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

token commitment
diffusion language models
self-supervised learning
trace-state policy
frozen decoder
🔎 Similar Papers