Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic

📅 2024-08-29
🏛️ arXiv.org
📈 Citations: 14
Influential: 1
📄 PDF
🤖 AI Summary
Existing methods rely on intuitive, System-1–style prompting, failing to elicit deep reasoning and self-critique capabilities in large language models (LLMs), and lack systematic investigation into the relationship between critique ability and problem-solving performance. This paper introduces Chain-of-Critique—the first Chain-of-Thought–driven, iterative self-critique framework—endowing LLMs with System-2–style slow thinking: multi-step reasoning generation, automated self-assessment, error localization, and iterative refinement enable dynamic correction of reasoning traces. Our contributions are threefold: (1) a novel distantly supervised paradigm for constructing critique data without human annotation; (2) empirical evidence demonstrating a strong positive correlation between critique capability and mathematical problem-solving accuracy; and (3) significant accuracy improvements on GSM8K and MATH benchmarks, alongside effective erroneous-path filtering and support for multi-round optimization.

Technology Category

Application Category

📝 Abstract
Self-critic has become a crucial mechanism for enhancing the reasoning performance of LLMs. However, current approaches mainly involve basic prompts for intuitive instance-level feedback, which resembles System-1 processes and limits the reasoning capabilities. Moreover, there is a lack of in-depth investigations into the relationship between LLM's ability to criticize and its task-solving performance. To address these issues, we propose Critic-CoT, a novel framework that pushes LLMs toward System-2-like critic capability. Through a step-wise CoT reasoning paradigm and the automatic construction of distant-supervision data without human annotation, Critic-CoT enables LLMs to engage in slow, analytic self-critique and refinement, thereby improving their reasoning abilities. Experiments on GSM8K and MATH demonstrate that our enhanced model significantly boosts task-solving performance by filtering out invalid solutions or iterative refinement. Furthermore, we investigate the intrinsic correlation between critique and task-solving abilities within LLMs, discovering that these abilities can mutually reinforce each other rather than conflict.
Problem

Research questions and friction points this paper is trying to address.

Enhancing LLM reasoning via self-critique and refinement
Exploring critique-task performance correlation in LLMs
Automating distant-supervision data for System-2-like critique
Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-thoughts Critic enhances LLM reasoning
Automatic distant-supervision data construction
System-2-like self-critique and refinement
🔎 Similar Papers
No similar papers found.
X
Xin Zheng
Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences; University of Chinese Academy of Sciences; Xiaohongshu Inc
Jie Lou
Jie Lou
Xiaohongshu
AlignmentRLHF
Boxi Cao
Boxi Cao
Institute of Software, Chinese Academy of Sciences
Natural Language Processing
Xueru Wen
Xueru Wen
School of Computer Science and Technology, University of Chinese Academy of Sciences
Natural Language ProcessingAlignmentLarge Language Model
Y
Yuqiu Ji
Xiaohongshu Inc
H
Hongyu Lin
Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences
Yaojie Lu
Yaojie Lu
Institute of Software, Chinese Academy of Sciences
Information ExtractionLarge Language Models
X
Xianpei Han
Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences
Debing Zhang
Debing Zhang
Xiaohongshu
Machine LearningComputer VisionDeep Learning
Le Sun
Le Sun
Institute of Software, CAS
information_retrievalnatural_language_processing