Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic

📅 2024-08-29

🏛️ arXiv.org

📈 Citations: 14

✨ Influential: 1

career value

183K/year

🤖 AI Summary

Existing methods rely on intuitive, System-1–style prompting, failing to elicit deep reasoning and self-critique capabilities in large language models (LLMs), and lack systematic investigation into the relationship between critique ability and problem-solving performance. This paper introduces Chain-of-Critique—the first Chain-of-Thought–driven, iterative self-critique framework—endowing LLMs with System-2–style slow thinking: multi-step reasoning generation, automated self-assessment, error localization, and iterative refinement enable dynamic correction of reasoning traces. Our contributions are threefold: (1) a novel distantly supervised paradigm for constructing critique data without human annotation; (2) empirical evidence demonstrating a strong positive correlation between critique capability and mathematical problem-solving accuracy; and (3) significant accuracy improvements on GSM8K and MATH benchmarks, alongside effective erroneous-path filtering and support for multi-round optimization.

Technology Category

Application Category

📝 Abstract

Self-critic has become a crucial mechanism for enhancing the reasoning performance of LLMs. However, current approaches mainly involve basic prompts for intuitive instance-level feedback, which resembles System-1 processes and limits the reasoning capabilities. Moreover, there is a lack of in-depth investigations into the relationship between LLM's ability to criticize and its task-solving performance. To address these issues, we propose Critic-CoT, a novel framework that pushes LLMs toward System-2-like critic capability. Through a step-wise CoT reasoning paradigm and the automatic construction of distant-supervision data without human annotation, Critic-CoT enables LLMs to engage in slow, analytic self-critique and refinement, thereby improving their reasoning abilities. Experiments on GSM8K and MATH demonstrate that our enhanced model significantly boosts task-solving performance by filtering out invalid solutions or iterative refinement. Furthermore, we investigate the intrinsic correlation between critique and task-solving abilities within LLMs, discovering that these abilities can mutually reinforce each other rather than conflict.

Problem

Research questions and friction points this paper is trying to address.

Enhancing LLM reasoning via self-critique and refinement

Exploring critique-task performance correlation in LLMs

Automating distant-supervision data for System-2-like critique

Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-thoughts Critic enhances LLM reasoning

Automatic distant-supervision data construction

System-2-like self-critique and refinement

🔎 Similar Papers

No similar papers found.