STDec: Spatio-Temporal Stability Guided Decoding for dLLMs

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing diffusion-based large language models (dLLMs) rely on a global confidence threshold during decoding, neglecting local contextual cues and temporal consistency across denoising steps, which limits inference efficiency. This work proposes STDec, a training-free spatiotemporal stability-guided decoding method that, for the first time, identifies and leverages spatiotemporal stability in dLLM decoding. STDec employs a spatially aware mechanism to dynamically generate thresholds based on neighboring decoded states and a temporally aware mechanism that relaxes thresholds for tokens exhibiting consistent identity over time, while remaining compatible with caching-based acceleration. Evaluated on both text and multimodal tasks, STDec substantially improves throughput—achieving a 14.17× speedup on the MBPP benchmark with LLaDA—without compromising task performance.
📝 Abstract
Diffusion Large Language Models (dLLMs) have achieved rapid progress, viewed as a promising alternative to the autoregressive paradigm. However, most dLLM decoders still adopt a global confidence threshold, and do not explicitly model local context from neighboring decoded states or temporal consistency of predicted token IDs across steps. To address this issue, we propose a simple spatio-temporal stability guided decoding approach, named STDec. We observe strong spatio-temporal stability in dLLM decoding: newly decoded tokens tend to lie near decoded neighbors, and their predicted IDs often remain consistent across several denoising steps. Inspired by this stability, our STDec includes spatial-aware decoding and temporal-aware decoding. The spatial-aware decoding dynamically generates the token-adaptive threshold by aggregating the decoded states of nearby tokens. The temporal-aware decoding relaxes the decoding thresholds for tokens whose predicted token IDs remain consistent over denoising steps. Our STDec is training-free and remains compatible with cache-based acceleration methods. Across textual reasoning and multimodal understanding benchmarks, STDec substantially improves throughput while maintaining comparable task performance score. Notably, on MBPP with LLaDA, STDec achieves up to 14.17x speedup with a comparable score. Homepage: https://yzchen02.github.io/STDec.
Problem

Research questions and friction points this paper is trying to address.

diffusion large language models
decoding
spatio-temporal stability
token consistency
local context
Innovation

Methods, ideas, or system contributions that make the work stand out.

spatio-temporal stability
diffusion LLMs
adaptive decoding threshold
training-free decoding
denoising consistency
🔎 Similar Papers
No similar papers found.
Y
Yuzhe Chen
Tianjin University
J
Jiale Cao
Tianjin University
Xuyang Liu
Xuyang Liu
Sichuan University
Vision-language ModelsModel CompressionToken CompressionTransfer Learning
J
Jin Xie
Chongqing University
A
Aiping Yang
Tianjin University
Yanwei Pang
Yanwei Pang
Tianjin University
Computer VisionImage ProcessingPattern RecognitionMachine Learning