🤖 AI Summary
Existing chain-of-thought (CoT) fine-tuning research predominantly focuses on technical implementation, lacking systematic analysis grounded in human cognitive mechanisms. This work bridges that gap by introducing, for the first time, a cognitive-dimension classification framework guided by de Bono’s “Six Thinking Hats” theory—systematically categorizing and reorganizing CoT fine-tuning methods along core human reasoning processes: planning, divergent thinking, intuitive judgment, and reflection. Methodologically, we integrate supervised and reinforcement fine-tuning, explicitly modeling CoT data according to empirically grounded human reasoning patterns. Empirically, we conduct comprehensive evaluations across mainstream benchmarks and model architectures, and release a continuously updated GitHub repository with curated resources. Our study fills a critical void at the intersection of CoT fine-tuning and cognitive science, establishing a scalable theoretical framework and practical paradigm for endowing large language models with human-like reasoning capabilities.
📝 Abstract
Chain of thought (CoT) fine-tuning aims to endow large language models (LLMs) with reasoning capabilities by training them on curated reasoning traces. It leverages both supervised and reinforced fine-tuning to cultivate human-like reasoning skills in LLMs, including detailed planning, divergent thinking, intuitive judgment, timely reflection, internal thinking, and fact perception, etc. As CoT fine-tuning has advanced, LLMs have demonstrated substantial improvements in tasks such as mathematical reasoning and code generation. However, existing surveys about CoT fine-tuning primarily focus on technical aspects and overlook a systematic analysis from the perspective of human reasoning mechanisms. Given that the ultimate goal of CoT fine-tuning is to enable LLMs to reason like humans, it is crucial to investigate this technique through the lens of human cognition. To fill this gap, we present the first comprehensive survey of CoT fine-tuning grounded in human reasoning theory. Specifically, inspired by the well-known Six Thinking Hats framework, which systematically characterizes common human thinking modes using six metaphorical hats, we classify and examine CoT fine-tuning methods through this lens. Furthermore, building upon this theory, we outline potential directions for future research in CoT fine-tuning. In addition, we compile a comprehensive overview of existing datasets and model performances, and a real-time GitHub repository footnote{https://github.com/AI-Chen/Awesome-CoT-Finetuning} that continuously tracks recent advances in this area is maintained. We hope this survey will serve as a valuable resource to inspire innovation and foster progress in this rapidly evolving field.