ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs

📅 2025-10-06

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Diffusion-based large language models (dLLMs) suffer from degraded generation quality on strongly dependent tasks during parallel decoding, due to their inherent conditional independence assumption among tokens—an issue inadequately captured by existing benchmarks (e.g., mathematical reasoning or code generation). Method: We propose ParallelBench, the first evaluation benchmark specifically designed for dLLMs’ parallel decoding, which innovatively formalizes decoding limitations from an information-theoretic perspective and integrates analytically tractable synthetic tasks with real-world challenging ones. Contribution/Results: Through conditional independence analysis, synthetic data modeling, and systematic decoding strategy evaluation, we reveal severe quality degradation in complex tasks and the absence of adaptive parallelism control in current methods. Our findings expose a fundamental tension between decoding efficiency and generation fidelity, providing both theoretical grounding and empirical evidence to guide the optimization of dLLM decoding paradigms.

Technology Category

Application Category

📝 Abstract

While most autoregressive LLMs are constrained to one-by-one decoding, diffusion LLMs (dLLMs) have attracted growing interest for their potential to dramatically accelerate inference through parallel decoding. Despite this promise, the conditional independence assumption in dLLMs causes parallel decoding to ignore token dependencies, inevitably degrading generation quality when these dependencies are strong. However, existing works largely overlook these inherent challenges, and evaluations on standard benchmarks (e.g., math and coding) are not sufficient to capture the quality degradation caused by parallel decoding. To address this gap, we first provide an information-theoretic analysis of parallel decoding. We then conduct case studies on analytically tractable synthetic list operations from both data distribution and decoding strategy perspectives, offering quantitative insights that highlight the fundamental limitations of parallel decoding. Building on these insights, we propose ParallelBench, the first benchmark specifically designed for dLLMs, featuring realistic tasks that are trivial for humans and autoregressive LLMs yet exceptionally challenging for dLLMs under parallel decoding. Using ParallelBench, we systematically analyze both dLLMs and autoregressive LLMs, revealing that: (i) dLLMs under parallel decoding can suffer dramatic quality degradation in real-world scenarios, and (ii) current parallel decoding strategies struggle to adapt their degree of parallelism based on task difficulty, thus failing to achieve meaningful speedup without compromising quality. Our findings underscore the pressing need for innovative decoding methods that can overcome the current speed-quality trade-off. We release our benchmark to help accelerate the development of truly efficient dLLMs.

Problem

Research questions and friction points this paper is trying to address.

Analyzing quality degradation in diffusion LLMs from parallel decoding

Evaluating limitations of parallel decoding strategies in real-world tasks

Developing benchmark to address speed-quality trade-off in diffusion LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes ParallelBench benchmark for diffusion LLMs

Analyzes parallel decoding limitations via synthetic tasks

Identifies need for adaptive decoding strategies

🔎 Similar Papers

Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion

2024-08-10arXiv.orgCitations: 0

Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference

2024-05-28arXiv.orgCitations: 12

Nvidia

30 USD - 94 USD

US, CA, Santa Clara

Research Scientist Intern, Multimodal AI (PhD)