DiffCoT: Diffusion-styled Chain-of-Thought Reasoning in LLMs

📅 2026-01-07
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a novel diffusion-based Chain-of-Thought (CoT) reasoning framework to address the vulnerability of traditional CoT methods to error propagation and their limited capacity for self-correction. By introducing a diffusion mechanism at the level of reasoning steps, the approach enables unified generation and retrospective refinement of intermediate steps through a sliding window. To preserve causal structure, the authors design a causal diffusion noise schedule that maintains token-level autoregressive properties while ensuring temporal consistency. Evaluated on three multi-step reasoning benchmarks, the method significantly outperforms existing preference optimization approaches, demonstrating enhanced robustness and improved error correction capabilities in complex reasoning tasks.

Technology Category

Application Category

📝 Abstract
Chain-of-Thought (CoT) reasoning improves multi-step mathematical problem solving in large language models but remains vulnerable to exposure bias and error accumulation, as early mistakes propagate irreversibly through autoregressive decoding. In this work, we propose DiffCoT, a diffusion-styled CoT framework that reformulates CoT reasoning as an iterative denoising process. DiffCoT integrates diffusion principles at the reasoning-step level via a sliding-window mechanism, enabling unified generation and retrospective correction of intermediate steps while preserving token-level autoregression. To maintain causal consistency, we further introduce a causal diffusion noise schedule that respects the temporal structure of reasoning chains. Extensive experiments on three multi-step CoT reasoning benchmarks across diverse model backbones demonstrate that DiffCoT consistently outperforms existing CoT preference optimization methods, yielding improved robustness and error-correction capability in CoT reasoning.
Problem

Research questions and friction points this paper is trying to address.

Chain-of-Thought
exposure bias
error accumulation
autoregressive decoding
reasoning robustness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion-based reasoning
Chain-of-Thought
Iterative denoising
Error correction
Causal consistency
🔎 Similar Papers
No similar papers found.