🤖 AI Summary
Masked diffusion language models (MDLMs) suffer from low inference efficiency due to fixed-step and sequential decoding. Existing parallel decoding methods (e.g., Fast-dLLM) rely on static global confidence thresholds, yet block- and step-level confidence exhibits high volatility; moreover, confidence trajectories across samples within the same task are highly consistent (with high cosine similarity), severely limiting threshold generalizability. This work is the first to identify and exploit this task-level confidence trajectory consistency. We propose a single-sequence calibration mechanism for dynamic thresholding: without per-sample tuning, one-time calibration adaptively adjusts block- and step-level gating thresholds, enabling efficient parallel denoising at near-zero overhead. Evaluated on GPQA, GSM8K, and HumanEval, our method achieves throughput improvements of 45%, 24%, and 50%, respectively, while maintaining state-of-the-art accuracy.
📝 Abstract
Masked diffusion language models (MDLMs) are becoming competitive with their autoregressive counterparts but typically decode with fixed steps and sequential unmasking. To accelerate decoding, recent work such as Fast-dLLM enables parallel decoding via a static global confidence threshold, yet we observe strong block- and step-wise confidence fluctuations and, within a dataset, near-identical confidence trajectories across inputs as measured by cosine similarity. Motivated by these observations, we introduce One-Shot Dynamic Thresholding (OSDT), which calibrates thresholds on a single sequence and applies them to subsequent inputs with negligible overhead. On GPQA, GSM8K, and HumanEval, OSDT attains superior accuracy-throughput trade-offs (+24% tokens/s on GSM8K at the best accuracy, +45% on GPQA with comparable accuracy, and +50% on HumanEval with a modest accuracy gap). Beyond these results, our findings suggest broader opportunities to leverage reusable task-level confidence signatures for more general-purpose algorithmic and systems innovations in diffusion decoding.