🤖 AI Summary
Diffusion Large Language Models (DLLMs) suffer from the “Parallel-Sequential Conflict” (PSC)—a fundamental tension between parallel decoding and causal reasoning—causing performance degradation toward autoregressive behavior on complex tasks, thereby limiting self-reflection, reasoning depth, and exploration breadth. This work formally introduces the PSC concept and establishes a three-dimensional analytical framework—spanning parallelism, diffusiveness, and sequentiality—to transcend conventional prompt-based constraints on reasoning depth. We systematically evaluate and optimize DLLMs via behavioral analysis, remasking experiments, diffusion step scheduling, early stopping, and parallel prompt design. Experiments reveal that effective parallelism in DLLMs is largely confined to simple tasks; integrating parallel prompts with early stopping reduces average decoding steps by 37%, significantly improving inference efficiency and cross-step consistency while exposing and alleviating deep-reasoning bottlenecks.
📝 Abstract
Recently, Diffusion Large Language Models (DLLMs) have offered high throughput and effective sequential reasoning, making them a competitive alternative to autoregressive LLMs (ALLMs). However, parallel decoding, which enables simultaneous token updates, conflicts with the causal order often required for rigorous reasoning. We first identify this conflict as the core Parallel-Sequential Contradiction (PSC). Behavioral analyses in both simple and complex reasoning tasks show that DLLMs exhibit genuine parallelism only for directly decidable outputs. As task difficulty increases, they revert to autoregressive-like behavior, a limitation exacerbated by autoregressive prompting, which nearly doubles the number of decoding steps with remasking without improving quality. Moreover, PSC restricts DLLMs'self-reflection, reasoning depth, and exploratory breadth. To further characterize PSC, we introduce three scaling dimensions for DLLMs: parallel, diffusion, and sequential. Empirically, while parallel scaling yields consistent improvements, diffusion and sequential scaling are constrained by PSC. Based on these findings, we propose several practical mitigations, parallel-oriented prompting, diffusion early stopping, and parallel scaling, to reduce PSC-induced ineffectiveness and inefficiencies.