Beyond Surface Reasoning: Unveiling the True Long Chain-of-Thought Capacity of Diffusion Large Language Models

📅 2025-10-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion Large Language Models (DLLMs) suffer from the “Parallel-Sequential Conflict” (PSC)—a fundamental tension between parallel decoding and causal reasoning—causing performance degradation toward autoregressive behavior on complex tasks, thereby limiting self-reflection, reasoning depth, and exploration breadth. This work formally introduces the PSC concept and establishes a three-dimensional analytical framework—spanning parallelism, diffusiveness, and sequentiality—to transcend conventional prompt-based constraints on reasoning depth. We systematically evaluate and optimize DLLMs via behavioral analysis, remasking experiments, diffusion step scheduling, early stopping, and parallel prompt design. Experiments reveal that effective parallelism in DLLMs is largely confined to simple tasks; integrating parallel prompts with early stopping reduces average decoding steps by 37%, significantly improving inference efficiency and cross-step consistency while exposing and alleviating deep-reasoning bottlenecks.

Technology Category

Application Category

📝 Abstract
Recently, Diffusion Large Language Models (DLLMs) have offered high throughput and effective sequential reasoning, making them a competitive alternative to autoregressive LLMs (ALLMs). However, parallel decoding, which enables simultaneous token updates, conflicts with the causal order often required for rigorous reasoning. We first identify this conflict as the core Parallel-Sequential Contradiction (PSC). Behavioral analyses in both simple and complex reasoning tasks show that DLLMs exhibit genuine parallelism only for directly decidable outputs. As task difficulty increases, they revert to autoregressive-like behavior, a limitation exacerbated by autoregressive prompting, which nearly doubles the number of decoding steps with remasking without improving quality. Moreover, PSC restricts DLLMs'self-reflection, reasoning depth, and exploratory breadth. To further characterize PSC, we introduce three scaling dimensions for DLLMs: parallel, diffusion, and sequential. Empirically, while parallel scaling yields consistent improvements, diffusion and sequential scaling are constrained by PSC. Based on these findings, we propose several practical mitigations, parallel-oriented prompting, diffusion early stopping, and parallel scaling, to reduce PSC-induced ineffectiveness and inefficiencies.
Problem

Research questions and friction points this paper is trying to address.

Identifying the Parallel-Sequential Contradiction limiting DLLMs' reasoning capacity
Analyzing how DLLMs revert to autoregressive behavior in complex tasks
Proposing mitigations to reduce inefficiencies caused by parallel-sequential conflicts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallel-oriented prompting reduces sequential constraints
Diffusion early stopping optimizes token update efficiency
Parallel scaling enhances reasoning capacity without autoregression
🔎 Similar Papers
No similar papers found.
Qiguang Chen
Qiguang Chen
Harbin Institute of Technology
Chain-of-ThoughtReasoningMultilingual LLMMulti-modal LLM
H
Hanjing Li
LARG, Research Center for Social Computing and Interactive Robotics, Harbin Institute of Technology
L
Libo Qin
School of Computer Science and Engineering, Central South University
Dengyun Peng
Dengyun Peng
Harbin Institute of Technology
Jinhao Liu
Jinhao Liu
Harbin Institute of Technology
Chain-of-ThoughtReasoningNatural Language Processing
J
Jiangyi Wang
LARG, Research Center for Social Computing and Interactive Robotics, Harbin Institute of Technology
C
Chengyue Wu
The University of Hong Kong
X
Xie Chen
Shanghai Jiao Tong University
Y
Yantao Du
ByteDance Seed (China)
Wanxiang Che
Wanxiang Che
Professor of Harbin Institute of Technology
Natural Language Processing