🤖 AI Summary
This work addresses the token commitment problem in diffusion-based language models for parallel generation—specifically, how to dynamically determine which candidate tokens to retain at each decoding step. The authors formulate this challenge as a learnable trajectory-state policy and introduce TraceLock, a lightweight plug-in controller that operates without fine-tuning the frozen backbone model. TraceLock leverages a self-supervised signal derived from future stability to guide token selection, enabling zero-shot deployment across diverse settings. Evaluated on question answering, mathematical reasoning, and code generation tasks, the method consistently outperforms both heuristic and learning-based baselines, achieving superior quality–step trade-offs and demonstrating robustness across varying context windows, output lengths, and step budgets.
📝 Abstract
Diffusion large language models promise faster generation by refining many token positions in parallel, but this parallelism introduces a hidden control problem: which proposed tokens should be transferred into the partially decoded sequence at each step? We refer to this decision as token commitment. Existing frozen-generator decoders largely rely on hand-designed confidence rules or block-specific acceptance filters. We argue that token commitment can instead be learned as a reusable trace-state policy. We introduce TraceLock, a lightweight plug-in controller that instantiates this policy for a frozen diffusion language model. Since oracle commitment times are unavailable, TraceLock derives self-supervision from future stability: at decoding step t, a proposed token for position i is labeled stable if it matches the final token at position i after the full decoding trace completes. The controller scores variable-length trace states and decides which active token proposals should be committed to the partially decoded sequence. Once trained for a given frozen backbone, the controller can be deployed across local-window widths, generation lengths, and step budgets without retraining or per-setting calibration. Experiments on question answering, mathematical reasoning, and code generation show that TraceLock improves the quality-step tradeoff over heuristic and learned baselines, with particularly stable behavior under cross-setting deployment. Diagnostic analyses show that its decisions are not reducible to scalar confidence, suggesting that frozen diffusion language models expose a learnable space of commitment trajectories beyond confidence-based decoding. Code is available at https://github.com/BobSun98/TraceLock.