STDec: Spatio-Temporal Stability Guided Decoding for dLLMs

📅 2026-04-07

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

Existing diffusion-based large language models (dLLMs) rely on a global confidence threshold during decoding, neglecting local contextual cues and temporal consistency across denoising steps, which limits inference efficiency. This work proposes STDec, a training-free spatiotemporal stability-guided decoding method that, for the first time, identifies and leverages spatiotemporal stability in dLLM decoding. STDec employs a spatially aware mechanism to dynamically generate thresholds based on neighboring decoded states and a temporally aware mechanism that relaxes thresholds for tokens exhibiting consistent identity over time, while remaining compatible with caching-based acceleration. Evaluated on both text and multimodal tasks, STDec substantially improves throughput—achieving a 14.17× speedup on the MBPP benchmark with LLaDA—without compromising task performance.

Technology Category

Application Category

📝 Abstract

Diffusion Large Language Models (dLLMs) have achieved rapid progress, viewed as a promising alternative to the autoregressive paradigm. However, most dLLM decoders still adopt a global confidence threshold, and do not explicitly model local context from neighboring decoded states or temporal consistency of predicted token IDs across steps. To address this issue, we propose a simple spatio-temporal stability guided decoding approach, named STDec. We observe strong spatio-temporal stability in dLLM decoding: newly decoded tokens tend to lie near decoded neighbors, and their predicted IDs often remain consistent across several denoising steps. Inspired by this stability, our STDec includes spatial-aware decoding and temporal-aware decoding. The spatial-aware decoding dynamically generates the token-adaptive threshold by aggregating the decoded states of nearby tokens. The temporal-aware decoding relaxes the decoding thresholds for tokens whose predicted token IDs remain consistent over denoising steps. Our STDec is training-free and remains compatible with cache-based acceleration methods. Across textual reasoning and multimodal understanding benchmarks, STDec substantially improves throughput while maintaining comparable task performance score. Notably, on MBPP with LLaDA, STDec achieves up to 14.17x speedup with a comparable score. Homepage: https://yzchen02.github.io/STDec.

Problem

Research questions and friction points this paper is trying to address.

diffusion large language models

decoding

spatio-temporal stability

token consistency

local context

Innovation

Methods, ideas, or system contributions that make the work stand out.

spatio-temporal stability

diffusion LLMs

adaptive decoding threshold