Self-Verifying Reflection Helps Transformers with CoT Reasoning

📅 2025-10-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Prior work shows that while large language models (LLMs) frequently perform self-verification within chain-of-thought (CoT) reasoning, their error detection capability remains limited, and the precise mechanism by which self-verification improves reasoning performance is poorly understood. Method: We propose a lightweight, natural-language-generation-free self-verification and reflection framework enabling small Transformer models (with only millions of parameters) to jointly perform generative and discriminative self-verification and correction within CoT. Contribution/Results: We provide theoretical guarantees—under reasonable boundedness assumptions—that this mechanism strictly improves performance. Empirically, we find reinforcement learning fine-tuning primarily refines superficial reasoning patterns rather than fundamentally reducing logical errors. On integer multiplication and Sudoku tasks, our method achieves reasoning accuracy comparable to that of large-scale models, offering the first reproducible evidence that self-verification–enabled reflection yields substantive, measurable gains in CoT effectiveness.

Technology Category

Application Category

📝 Abstract
Advanced large language models (LLMs) frequently reflect in reasoning chain-of-thoughts (CoTs), where they self-verify the correctness of current solutions and explore alternatives. However, given recent findings that LLMs detect limited errors in CoTs, how reflection contributes to empirical improvements remains unclear. To analyze this issue, in this paper, we present a minimalistic reasoning framework to support basic self-verifying reflection for small transformers without natural language, which ensures analytic clarity and reduces the cost of comprehensive experiments. Theoretically, we prove that self-verifying reflection guarantees improvements if verification errors are properly bounded. Experimentally, we show that tiny transformers, with only a few million parameters, benefit from self-verification in both training and reflective execution, reaching remarkable LLM-level performance in integer multiplication and Sudoku. Similar to LLM results, we find that reinforcement learning (RL) improves in-distribution performance and incentivizes frequent reflection for tiny transformers, yet RL mainly optimizes shallow statistical patterns without faithfully reducing verification errors. In conclusion, integrating generative transformers with discriminative verification inherently facilitates CoT reasoning, regardless of scaling and natural language.
Problem

Research questions and friction points this paper is trying to address.

Analyzing how self-verifying reflection improves reasoning in transformers
Developing minimalistic framework for transformers with self-verification capabilities
Evaluating reflection impact on small transformers' mathematical reasoning performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-verifying reflection framework for small transformers
Bounded verification errors guarantee reasoning improvements
Generative transformers integrated with discriminative verification
🔎 Similar Papers
No similar papers found.
Z
Zhongwei Yu
The Hong Kong University of Science and Technology (Guangzhou)
W
Wannian Xia
Institute of Automation, Chinese Academy of Sciences
Xue Yan
Xue Yan
Ph.d. student, Institute of Automation,Chinese Academy of Sciences
Machine Learning
B
Bo Xu
Institute of Automation, Chinese Academy of Sciences
H
Haifeng Zhang
Institute of Automation, Chinese Academy of Sciences
Yali Du
Yali Du
Turing Fellow, Associate professor, King's College London
Multi-Agent Reinforcement LearningHuman-ai coordinationAlignmentCooperative AI
J
Jun Wang
University College London