Beyond Static Cutoffs: One-Shot Dynamic Thresholding for Diffusion Language Models

📅 2025-11-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Masked diffusion language models (MDLMs) suffer from low inference efficiency due to fixed-step and sequential decoding. Existing parallel decoding methods (e.g., Fast-dLLM) rely on static global confidence thresholds, yet block- and step-level confidence exhibits high volatility; moreover, confidence trajectories across samples within the same task are highly consistent (with high cosine similarity), severely limiting threshold generalizability. This work is the first to identify and exploit this task-level confidence trajectory consistency. We propose a single-sequence calibration mechanism for dynamic thresholding: without per-sample tuning, one-time calibration adaptively adjusts block- and step-level gating thresholds, enabling efficient parallel denoising at near-zero overhead. Evaluated on GPQA, GSM8K, and HumanEval, our method achieves throughput improvements of 45%, 24%, and 50%, respectively, while maintaining state-of-the-art accuracy.

Technology Category

Application Category

📝 Abstract
Masked diffusion language models (MDLMs) are becoming competitive with their autoregressive counterparts but typically decode with fixed steps and sequential unmasking. To accelerate decoding, recent work such as Fast-dLLM enables parallel decoding via a static global confidence threshold, yet we observe strong block- and step-wise confidence fluctuations and, within a dataset, near-identical confidence trajectories across inputs as measured by cosine similarity. Motivated by these observations, we introduce One-Shot Dynamic Thresholding (OSDT), which calibrates thresholds on a single sequence and applies them to subsequent inputs with negligible overhead. On GPQA, GSM8K, and HumanEval, OSDT attains superior accuracy-throughput trade-offs (+24% tokens/s on GSM8K at the best accuracy, +45% on GPQA with comparable accuracy, and +50% on HumanEval with a modest accuracy gap). Beyond these results, our findings suggest broader opportunities to leverage reusable task-level confidence signatures for more general-purpose algorithmic and systems innovations in diffusion decoding.
Problem

Research questions and friction points this paper is trying to address.

Dynamic thresholding addresses confidence fluctuations in diffusion language models
Method accelerates decoding by leveraging reusable task-level confidence signatures
Improves accuracy-throughput tradeoffs across reasoning and coding benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

One-shot dynamic thresholding calibrates confidence thresholds
Applies calibrated thresholds across inputs with minimal overhead
Leverages reusable task-level confidence signatures for decoding
🔎 Similar Papers
No similar papers found.