Improving Consistency in Retrieval-Augmented Systems with Group Similarity Rewards

📅 2025-10-05

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

Retrieval-augmented generation (RAG) systems exhibit inconsistent outputs under semantically equivalent queries, undermining their trustworthiness and deployment safety. To address this, we propose Con-RAG: the first framework to decompose RAG consistency into three hierarchical evaluation dimensions—retrieval, generation, and end-to-end—and construct the inaugural end-to-end consistency diagnosis framework. We design PS-GRPO, a policy optimization algorithm that leverages group-wise similarity rewards over paraphrased query clusters to jointly improve information consistency and answer accuracy—without ground-truth supervision. Furthermore, we introduce scalable approximate reward computation and multi-trajectory training to enable efficient large-scale optimization. Evaluated on multi-source question answering, Con-RAG significantly outperforms strong baselines while achieving both high consistency and high accuracy, demonstrating suitability for high-stakes domains such as healthcare and law.

Technology Category

Application Category

📝 Abstract

RAG systems are increasingly deployed in high-stakes domains where users expect outputs to be consistent across semantically equivalent queries. However, existing systems often exhibit significant inconsistencies due to variability in both the retriever and generator (LLM), undermining trust and reliability. In this work, we focus on information consistency, i.e., the requirement that outputs convey the same core content across semantically equivalent inputs. We introduce a principled evaluation framework that decomposes RAG consistency into retriever-level, generator-level, and end-to-end components, helping identify inconsistency sources. To improve consistency, we propose Paraphrased Set Group Relative Policy Optimization (PS-GRPO), an RL approach that leverages multiple rollouts across paraphrased set to assign group similarity rewards. We leverage PS-GRPO to achieve Information Consistent RAG (Con-RAG), training the generator to produce consistent outputs across paraphrased queries and remain robust to retrieval-induced variability. Because exact reward computation over paraphrase sets is computationally expensive, we also introduce a scalable approximation method that retains effectiveness while enabling efficient, large-scale training. Empirical evaluations across short-form, multi-hop, and long-form QA benchmarks demonstrate that Con-RAG significantly improves both consistency and accuracy over strong baselines, even in the absence of explicit ground-truth supervision. Our work provides practical solutions for evaluating and building reliable RAG systems for safety-critical deployments.

Problem

Research questions and friction points this paper is trying to address.

Addressing output inconsistency across semantically equivalent queries in RAG systems

Developing evaluation framework to identify retriever and generator inconsistency sources

Proposing reinforcement learning method to train consistent outputs without ground-truth supervision

Innovation

Methods, ideas, or system contributions that make the work stand out.

Group similarity rewards for paraphrased query sets

Scalable approximation for efficient large-scale training

Training generator to be robust to retrieval variability

🔎 Similar Papers

No similar papers found.