Beyond Surface Reasoning: Unveiling the True Long Chain-of-Thought Capacity of Diffusion Large Language Models

📅 2025-10-10

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

Diffusion Large Language Models (DLLMs) suffer from the “Parallel-Sequential Conflict” (PSC)—a fundamental tension between parallel decoding and causal reasoning—causing performance degradation toward autoregressive behavior on complex tasks, thereby limiting self-reflection, reasoning depth, and exploration breadth. This work formally introduces the PSC concept and establishes a three-dimensional analytical framework—spanning parallelism, diffusiveness, and sequentiality—to transcend conventional prompt-based constraints on reasoning depth. We systematically evaluate and optimize DLLMs via behavioral analysis, remasking experiments, diffusion step scheduling, early stopping, and parallel prompt design. Experiments reveal that effective parallelism in DLLMs is largely confined to simple tasks; integrating parallel prompts with early stopping reduces average decoding steps by 37%, significantly improving inference efficiency and cross-step consistency while exposing and alleviating deep-reasoning bottlenecks.

Technology Category

Application Category

📝 Abstract

Recently, Diffusion Large Language Models (DLLMs) have offered high throughput and effective sequential reasoning, making them a competitive alternative to autoregressive LLMs (ALLMs). However, parallel decoding, which enables simultaneous token updates, conflicts with the causal order often required for rigorous reasoning. We first identify this conflict as the core Parallel-Sequential Contradiction (PSC). Behavioral analyses in both simple and complex reasoning tasks show that DLLMs exhibit genuine parallelism only for directly decidable outputs. As task difficulty increases, they revert to autoregressive-like behavior, a limitation exacerbated by autoregressive prompting, which nearly doubles the number of decoding steps with remasking without improving quality. Moreover, PSC restricts DLLMs'self-reflection, reasoning depth, and exploratory breadth. To further characterize PSC, we introduce three scaling dimensions for DLLMs: parallel, diffusion, and sequential. Empirically, while parallel scaling yields consistent improvements, diffusion and sequential scaling are constrained by PSC. Based on these findings, we propose several practical mitigations, parallel-oriented prompting, diffusion early stopping, and parallel scaling, to reduce PSC-induced ineffectiveness and inefficiencies.

Problem

Research questions and friction points this paper is trying to address.

Identifying the Parallel-Sequential Contradiction limiting DLLMs' reasoning capacity

Analyzing how DLLMs revert to autoregressive behavior in complex tasks

Proposing mitigations to reduce inefficiencies caused by parallel-sequential conflicts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallel-oriented prompting reduces sequential constraints

Diffusion early stopping optimizes token update efficiency

Parallel scaling enhances reasoning capacity without autoregression

🔎 Similar Papers

Do Large Language Models Latently Perform Multi-Hop Reasoning?