DSB: Dynamic Sliding Block Scheduling for Diffusion LLMs

📅 2026-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing diffusion-based large language models employ fixed chunk scheduling strategies that fail to adapt to the dynamic variations in textual semantic complexity, thereby limiting both generation quality and inference efficiency. This work proposes a Dynamic Sliding Block (DSB) scheduling method, which introduces, for the first time, a training-free dynamic block adjustment mechanism that adaptively aligns with semantic complexity. Furthermore, DSB integrates a novel cache optimization technique—DSB Cache—to enhance key-value (KV) cache management, significantly improving parallel decoding efficiency. Extensive experiments across multiple mainstream models and benchmarks demonstrate consistent gains in both generation quality and inference speed, confirming the method’s generality and effectiveness.

Technology Category

Application Category

📝 Abstract
Diffusion large language models (dLLMs) have emerged as a promising alternative for text generation, distinguished by their native support for parallel decoding. In practice, block inference is crucial for avoiding order misalignment in global bidirectional decoding and improving output quality. However, the widely-used fixed, predefined block (naive) schedule is agnostic to semantic difficulty, making it a suboptimal strategy for both quality and efficiency: it can force premature commitments to uncertain positions while delaying easy positions near block boundaries. In this work, we analyze the limitations of naive block scheduling and disclose the importance of dynamically adapting the schedule to semantic difficulty for reliable and efficient inference. Motivated by this, we propose Dynamic Sliding Block (DSB), a training-free block scheduling method that uses a sliding block with a dynamic size to overcome the rigidity of the naive block. To further improve efficiency, we introduce DSB Cache, a training-free KV-cache mechanism tailored to DSB. Extensive experiments across multiple models and benchmarks demonstrate that DSB, together with DSB Cache, consistently improves both generation quality and inference efficiency for dLLMs. Code is released at https://github.com/lizhuo-luo/DSB.
Problem

Research questions and friction points this paper is trying to address.

diffusion LLMs
block scheduling
semantic difficulty
parallel decoding
inference efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Sliding Block
Diffusion LLMs
Block Scheduling
Training-free Inference
KV-cache Optimization
🔎 Similar Papers
No similar papers found.