CRITIC-R1: Learning Structured Critics for Retrieval-Augmented Generation

📅 2026-05-28

📈 Citations: 0

✨ Influential: 0

career value

146K/year

🤖 AI Summary

This work addresses the susceptibility of existing retrieval-augmented generation (RAG) systems to hallucinations and reasoning errors, compounded by current critique mechanisms that provide coarse, unstructured, and overly intrusive feedback, leading to unreliable corrections. To overcome these limitations, the authors propose CRITIC-R1, a framework that formalizes RAG critique as an explicit, multi-dimensional error diagnosis task encompassing judgment, localization, analysis, and repair. Leveraging GRPO-based reinforcement learning guided by process-level supervision signals from external large language models, CRITIC-R1 trains a structured critique model augmented with a dual-reward mechanism—conservative judgment alignment and diagnostic quality alignment—to effectively curb over-intervention while enhancing feedback granularity and reliability. Experimental results demonstrate that this approach significantly outperforms strong RAG baselines across five question-answering benchmarks, consistently improving answer quality.

📝 Abstract

Retrieval-augmented generation (RAG) improves knowledge-intensive question answering by incorporating external evidence. However, existing RAG methods still suffer from hallucinations and subtle reasoning errors. Recent studies introduce external critics to refine RAG outputs, yet they often provide coarse-grained and weakly structured feedback, exhibit over-aggressive intervention, and lead to noisy and unreliable refinement, limiting their effectiveness for correction. To tackle these issues, we propose CRITIC-R1, a structured critic framework that formulates and learns RAG critique as an explicit error diagnosis problem using reinforcement learning (RL). Our framework categorizes common RAG errors into multiple diagnostic dimensions, including verdict, error location, reasoning analysis, and fix generation. To learn these capabilities, we design two reward functions: Conservative Judgement Alignment (CJA) first encourages calibrated high-level judgements while mitigating the over-aggressive phenomenon, whereas Diagnostic Quality Alignment (DQA) further improves fine-grained diagnostic feedback through gated rewards. We train the critic model using GRPO-based RL with process-level supervision collected from external LLM teacher models. Experiments across five QA benchmarks show that CRITIC-R1 consistently improves answer quality over strong RAG baselines. Our source code is available at https://anonymous.4open.science/r/critic-r1-FCB0

Problem

Research questions and friction points this paper is trying to address.

Retrieval-Augmented Generation

hallucinations

reasoning errors

structured critics

error diagnosis

Innovation

Methods, ideas, or system contributions that make the work stand out.

structured critic

retrieval-augmented generation

reinforcement learning