Enabling Scalable Oversight via Self-Evolving Critic

📅 2025-01-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited autonomous critique capability of large language models (LLMs) in scenarios where human evaluation is infeasible or supervision signals are absent, this paper proposes a self-evolving critique framework that requires no external annotations and does not rely on stronger oracle models. The method integrates contrastive, stepwise critique with a self-verification mechanism grounded in corrected outputs. Built upon Qwen2.5-72B-Instruct, it employs synthetic data training and contrastive learning to drive iterative self-improvement. Experiments demonstrate up to a 10.3% improvement over baselines on critique-and-revision and error-detection benchmarks; performance scales stably with both data volume and model size. The core contribution is the first fully self-supervised, scalable, and outcome-oriented closed-loop self-critique system for LLMs.

Technology Category

Application Category

📝 Abstract
Despite their remarkable performance, the development of Large Language Models (LLMs) faces a critical challenge in scalable oversight: providing effective feedback for tasks where human evaluation is difficult or where LLMs outperform humans. While there is growing interest in using LLMs for critique, current approaches still rely on human annotations or more powerful models, leaving the issue of enhancing critique capabilities without external supervision unresolved. We introduce SCRIT (Self-evolving CRITic), a framework that enables genuine self-evolution of critique abilities. Technically, SCRIT self-improves by training on synthetic data, generated by a contrastive-based self-critic that uses reference solutions for step-by-step critique, and a self-validation mechanism that ensures critique quality through correction outcomes. Implemented with Qwen2.5-72B-Instruct, one of the most powerful LLMs, SCRIT achieves up to a 10.3% improvement on critique-correction and error identification benchmarks. Our analysis reveals that SCRIT's performance scales positively with data and model size, outperforms alternative approaches, and benefits critically from its self-validation component.
Problem

Research questions and friction points this paper is trying to address.

Self-supervised Learning
Large Language Models
Human-level Performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

SCRIT
Self-supervised Learning
Criticism Enhancement
🔎 Similar Papers
No similar papers found.