Breaking Block Boundaries: Anchor-based History-stable Decoding for Diffusion Large Language Models

📅 2026-04-10

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work addresses the limitation of semi-autoregressive decoding in diffusion-based large language models, where fixed block boundaries hinder timely cross-block generation of stable tokens. The authors propose a training-free, plug-and-play dynamic decoding strategy that systematically analyzes, for the first time, the relationship between token stability and convergence trends. By introducing dynamic anchors to monitor stability in real time, the method triggers cross-block speculative decoding immediately upon stability detection, thereby transcending rigid block constraints and enabling effective historical context integration with improved generation efficiency. Evaluated across language, vision–language, and audio–language tasks, the approach consistently enhances both performance and inference speed, achieving an 80% reduction in decoding steps and a 3.67% accuracy gain on the BBH benchmark.

Technology Category

Application Category

📝 Abstract

Diffusion Large Language Models (dLLMs) have recently become a promising alternative to autoregressive large language models (ARMs). Semi-autoregressive (Semi-AR) decoding is widely employed in base dLLMs and advanced decoding strategies due to its superior performance. However, our observations reveal that Semi-AR decoding suffers from inherent block constraints, which cause the decoding of many cross-block stable tokens to be unnecessarily delayed. To address this challenge, we systematically investigate the identification of stable tokens and present three key findings: (1) naive lookahead decoding is unreliable, (2) token stability closely correlates with convergence trend, and (3) historical information is isolated. Building on these insights, we propose Anchor-based History-stable Decoding (AHD), a training-free, plug-and-play dynamic decoding strategy. Specifically, AHD monitors the stability trend of tokens in real time through dynamic anchors. Once a token reaches stability, it initiates early cross-block decoding to enhance efficiency and performance. Extensive experiments across language, vision-language, and audio-language domains demonstrate that AHD simultaneously improves both performance and inference efficiency. Notably, AHD effectively reverses the performance degradation typically observed in existing advanced decoding acceleration strategies. For instance, on the BBH benchmark, our approach reduces decoding steps by 80% while improving performance by 3.67%.

Problem

Research questions and friction points this paper is trying to address.

diffusion language models

semi-autoregressive decoding

block boundaries

token stability

decoding efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion language models

semi-autoregressive decoding

stable token detection