🤖 AI Summary
Existing methods rely on intuitive, System-1–style prompting, failing to elicit deep reasoning and self-critique capabilities in large language models (LLMs), and lack systematic investigation into the relationship between critique ability and problem-solving performance. This paper introduces Chain-of-Critique—the first Chain-of-Thought–driven, iterative self-critique framework—endowing LLMs with System-2–style slow thinking: multi-step reasoning generation, automated self-assessment, error localization, and iterative refinement enable dynamic correction of reasoning traces. Our contributions are threefold: (1) a novel distantly supervised paradigm for constructing critique data without human annotation; (2) empirical evidence demonstrating a strong positive correlation between critique capability and mathematical problem-solving accuracy; and (3) significant accuracy improvements on GSM8K and MATH benchmarks, alongside effective erroneous-path filtering and support for multi-round optimization.
📝 Abstract
Self-critic has become a crucial mechanism for enhancing the reasoning performance of LLMs. However, current approaches mainly involve basic prompts for intuitive instance-level feedback, which resembles System-1 processes and limits the reasoning capabilities. Moreover, there is a lack of in-depth investigations into the relationship between LLM's ability to criticize and its task-solving performance. To address these issues, we propose Critic-CoT, a novel framework that pushes LLMs toward System-2-like critic capability. Through a step-wise CoT reasoning paradigm and the automatic construction of distant-supervision data without human annotation, Critic-CoT enables LLMs to engage in slow, analytic self-critique and refinement, thereby improving their reasoning abilities. Experiments on GSM8K and MATH demonstrate that our enhanced model significantly boosts task-solving performance by filtering out invalid solutions or iterative refinement. Furthermore, we investigate the intrinsic correlation between critique and task-solving abilities within LLMs, discovering that these abilities can mutually reinforce each other rather than conflict.