Dependency-Aware Parallel Decoding via Attention for Diffusion LLMs

📅 2026-03-13

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Diffusion-based large language models (dLLMs) struggle to capture inter-token dependencies during parallel decoding, as each denoising step only provides marginal distributions. This work proposes DAPD, a training-free parallel decoding method that explicitly constructs a conditional dependency graph among masked tokens using self-attention for the first time. By selecting an independent set from this graph, DAPD enables decoupled parallel updates, preventing highly correlated tokens from being simultaneously unmasked. The approach requires neither auxiliary models nor retraining and significantly improves both efficiency and accuracy of dLLMs under arbitrary generation orders. Experiments on LLaDA and Dream demonstrate superior trade-offs between decoding steps and generation quality.

Technology Category

Application Category

📝 Abstract

Parallel decoding for diffusion LLMs (dLLMs) is difficult because each denoising step provides only token-wise marginal distributions, while unmasking multiple tokens simultaneously requires accounting for inter-token dependencies. We propose Dependency-Aware Parallel Decoding (DAPD), a simple, training-free decoding method that uses self-attention to induce a conditional dependency graph over masked tokens. At each iteration, edges in this graph capture strong token interactions, while non-edges indicate weak dependence. Parallel decoding is then reduced to selecting an independent set on the graph and unmasking the selected tokens in parallel. This avoids co-updating strongly coupled tokens without auxiliary models or retraining. Experiments on LLaDA and Dream show that DAPD improves the accuracy-steps trade-off over existing methods and enables more globally distributed parallel updates that better exploit the any-order generation capability of dLLMs.

Problem

Research questions and friction points this paper is trying to address.

diffusion LLMs

parallel decoding

token dependencies

denoising

masked tokens

Innovation

Methods, ideas, or system contributions that make the work stand out.

parallel decoding

diffusion LLMs

dependency graph