Response Quality Assessment for Retrieval-Augmented Generation via Conditional Conformal Factuality

📅 2025-06-25

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

Existing RAG research focuses on overall question-answering accuracy while neglecting fine-grained factuality assessment of sub-claims within responses; meanwhile, current credibility-enhancement methods either lack statistical guarantees or require ground-truth answer annotations. Method: We propose Conformal-RAG—a novel framework that tightly integrates conditional conformal prediction with RAG’s internal evidence tracing and structured parsing of LLM outputs. Contribution/Results: Conformal-RAG enables sub-claim-level factuality assessment without ground-truth labels and provides rigorous conditional coverage probability guarantees. It supports multi-subdomain adaptive calibration and, at the same confidence level, achieves up to a 60% higher retention rate of high-quality sub-claims compared to baseline approaches applying conformal prediction directly to raw LLM outputs.

Technology Category

Application Category

📝 Abstract

Existing research on Retrieval-Augmented Generation (RAG) primarily focuses on improving overall question-answering accuracy, often overlooking the quality of sub-claims within generated responses. Recent methods that attempt to improve RAG trustworthiness, such as through auto-evaluation metrics, lack probabilistic guarantees or require ground truth answers. To address these limitations, we propose Conformal-RAG, a novel framework inspired by recent applications of conformal prediction (CP) on large language models (LLMs). Conformal-RAG leverages CP and internal information from the RAG mechanism to offer statistical guarantees on response quality. It ensures group-conditional coverage spanning multiple sub-domains without requiring manual labelling of conformal sets, making it suitable for complex RAG applications. Compared to existing RAG auto-evaluation methods, Conformal-RAG offers statistical guarantees on the quality of refined sub-claims, ensuring response reliability without the need for ground truth answers. Additionally, our experiments demonstrate that by leveraging information from the RAG system, Conformal-RAG retains up to 60% more high-quality sub-claims from the response compared to direct applications of CP to LLMs, while maintaining the same reliability guarantee.

Problem

Research questions and friction points this paper is trying to address.

Assessing sub-claim quality in RAG responses

Providing statistical guarantees without ground truth

Improving reliability of auto-evaluated RAG outputs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses conformal prediction for RAG quality

Ensures statistical guarantees without labels

Retains more high-quality sub-claims efficiently

🔎 Similar Papers

No similar papers found.

Qualcomm

$104,000.00 - $156,000.00

San Diego, California, United States of America

Principal Machine Learning Engineer, GAI Search Relevance - Ranking - Moveworks

ServiceNow

Mountain View, CALIFORNIA, US

Authors to Follow