SCOUT: Teaching Pre-trained Language Models to Enhance Reasoning via Flow Chain-of-Thought

📅 2025-05-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing chain-of-thought (CoT) methods rely on labor-intensive manual annotation of intermediate reasoning steps, limiting generalizability; recursive reasoning approaches often require costly pretraining and lack theoretical modeling of iterative evolution. This paper proposes Flow CoT—a paradigm that models reasoning as progressive evolution of latent cognitive states—requiring neither explicit CoT supervision nor additional pretraining. We further introduce SCOUT, a lightweight fine-tuning framework integrating progressive knowledge distillation and cross-step backtracking attention to enable multi-round deep reasoning. Key contributions include: (i) the first formalization of reasoning as cognitive trajectory evolution; (ii) the first scalable, unsupervised, pretraining-free recursive reasoning method; and (iii) a staged teacher-alignment strategy coupled with a backtracking mechanism that preserves the original computational flow. Evaluated on eight reasoning benchmarks, Flow CoT achieves an average accuracy gain of +1.8%, alongside substantial improvements in explanation quality and reasoning depth.

Technology Category

Application Category

📝 Abstract
Chain of Thought (CoT) prompting improves the reasoning performance of large language models (LLMs) by encouraging step by step thinking. However, CoT-based methods depend on intermediate reasoning steps, which limits scalability and generalization. Recent work explores recursive reasoning, where LLMs reuse internal layers across iterations to refine latent representations without explicit CoT supervision. While promising, these approaches often require costly pretraining and lack a principled framework for how reasoning should evolve across iterations. We address this gap by introducing Flow Chain of Thought (Flow CoT), a reasoning paradigm that models recursive inference as a progressive trajectory of latent cognitive states. Flow CoT frames each iteration as a distinct cognitive stage deepening reasoning across iterations without relying on manual supervision. To realize this, we propose SCOUT (Stepwise Cognitive Optimization Using Teachers), a lightweight fine tuning framework that enables Flow CoT style reasoning without the need for pretraining. SCOUT uses progressive distillation to align each iteration with a teacher of appropriate capacity, and a cross attention based retrospective module that integrates outputs from previous iterations while preserving the models original computation flow. Experiments across eight reasoning benchmarks show that SCOUT consistently improves both accuracy and explanation quality, achieving up to 1.8% gains under fine tuning. Qualitative analyses further reveal that SCOUT enables progressively deeper reasoning across iterations refining both belief formation and explanation granularity. These results not only validate the effectiveness of SCOUT, but also demonstrate the practical viability of Flow CoT as a scalable framework for enhancing reasoning in LLMs.
Problem

Research questions and friction points this paper is trying to address.

Enhancing reasoning in LLMs without explicit CoT supervision
Reducing dependency on costly pretraining for recursive reasoning
Improving accuracy and explanation quality in reasoning tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Flow CoT models recursive inference as progressive cognitive states
SCOUT enables Flow CoT reasoning without pretraining via fine-tuning
Progressive distillation aligns iterations with teacher capacities
🔎 Similar Papers
No similar papers found.
Guanghao Li
Guanghao Li
Fudan University
Graphics
Wenhao Jiang
Wenhao Jiang
GML, Tencent, PolyU
Computer VisionMachine LearningFoundation Models
M
Mingfeng Chen
SIGS, Tsinghua University
Y
Yan Li
The Hong Kong University of Science and Technology
H
Hao Yu
SIGS, Tsinghua University
Shuting Dong
Shuting Dong
Tsinghua University
Computer Vision,Time Series Prediction
Tao Ren
Tao Ren
Peking University
Foundation modelOptimizationReinforcement Learning
M
Ming Tang
Southern University of Science and Technology
C
Chun Yuan
SIGS, Tsinghua University