Predicting Retrieval Utility and Answer Quality in Retrieval-Augmented Generation

📅 2026-01-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of accurately predicting the utility of retrieved documents for final answer quality in retrieval-augmented generation (RAG). It introduces two novel prediction tasks—Retrieval Performance Prediction (RPP) and Generation Performance Prediction (GPP)—thereby extending query performance prediction into the RAG framework for the first time. The authors propose a linear regression model that jointly leverages three categories of features: retriever-centric (e.g., query-document relevance), reader-centric (e.g., LLM conditional perplexity), and intrinsic document quality (e.g., readability). Experimental results on the Natural Questions dataset demonstrate that this multi-feature fusion strategy significantly improves the accuracy of both RPP and GPP, offering an effective utility evaluation mechanism for RAG systems.

Technology Category

Application Category

📝 Abstract
The quality of answers generated by large language models (LLMs) in retrieval-augmented generation (RAG) is largely influenced by the contextual information contained in the retrieved documents. A key challenge for improving RAG is to predict both the utility of retrieved documents -- quantified as the performance gain from using context over generation without context -- and the quality of the final answers in terms of correctness and relevance. In this paper, we define two prediction tasks within RAG. The first is retrieval performance prediction (RPP), which estimates the utility of retrieved documents. The second is generation performance prediction (GPP), which estimates the final answer quality. We hypothesise that in RAG, the topical relevance of retrieved documents correlates with their utility, suggesting that query performance prediction (QPP) approaches can be adapted for RPP and GPP. Beyond these retriever-centric signals, we argue that reader-centric features, such as the LLM's perplexity of the retrieved context conditioned on the input query, can further enhance prediction accuracy for both RPP and GPP. Finally, we propose that features reflecting query-agnostic document quality and readability can also provide useful signals to the predictions. We train linear regression models with the above categories of predictors for both RPP and GPP. Experiments on the Natural Questions (NQ) dataset show that combining predictors from multiple feature categories yields the most accurate estimates of RAG performance.
Problem

Research questions and friction points this paper is trying to address.

Retrieval-Augmented Generation
Retrieval Utility Prediction
Answer Quality Prediction
Query Performance Prediction
Large Language Models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Augmented Generation
Performance Prediction
Query Performance Prediction
Perplexity-based Features
Document Quality
🔎 Similar Papers
No similar papers found.