Beyond Static Cutoffs: One-Shot Dynamic Thresholding for Diffusion Language Models

📅 2025-11-03

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Masked diffusion language models (MDLMs) suffer from low inference efficiency due to fixed-step and sequential decoding. Existing parallel decoding methods (e.g., Fast-dLLM) rely on static global confidence thresholds, yet block- and step-level confidence exhibits high volatility; moreover, confidence trajectories across samples within the same task are highly consistent (with high cosine similarity), severely limiting threshold generalizability. This work is the first to identify and exploit this task-level confidence trajectory consistency. We propose a single-sequence calibration mechanism for dynamic thresholding: without per-sample tuning, one-time calibration adaptively adjusts block- and step-level gating thresholds, enabling efficient parallel denoising at near-zero overhead. Evaluated on GPQA, GSM8K, and HumanEval, our method achieves throughput improvements of 45%, 24%, and 50%, respectively, while maintaining state-of-the-art accuracy.

Technology Category

Application Category

📝 Abstract

Masked diffusion language models (MDLMs) are becoming competitive with their autoregressive counterparts but typically decode with fixed steps and sequential unmasking. To accelerate decoding, recent work such as Fast-dLLM enables parallel decoding via a static global confidence threshold, yet we observe strong block- and step-wise confidence fluctuations and, within a dataset, near-identical confidence trajectories across inputs as measured by cosine similarity. Motivated by these observations, we introduce One-Shot Dynamic Thresholding (OSDT), which calibrates thresholds on a single sequence and applies them to subsequent inputs with negligible overhead. On GPQA, GSM8K, and HumanEval, OSDT attains superior accuracy-throughput trade-offs (+24% tokens/s on GSM8K at the best accuracy, +45% on GPQA with comparable accuracy, and +50% on HumanEval with a modest accuracy gap). Beyond these results, our findings suggest broader opportunities to leverage reusable task-level confidence signatures for more general-purpose algorithmic and systems innovations in diffusion decoding.

Problem

Research questions and friction points this paper is trying to address.

Dynamic thresholding addresses confidence fluctuations in diffusion language models

Method accelerates decoding by leveraging reusable task-level confidence signatures

Improves accuracy-throughput tradeoffs across reasoning and coding benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

One-shot dynamic thresholding calibrates confidence thresholds

Applies calibrated thresholds across inputs with minimal overhead

Leverages reusable task-level confidence signatures for decoding

🔎 Similar Papers

No similar papers found.