RAG-E: Quantifying Retriever-Generator Alignment and Failure Modes

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

career value

150K/year

🤖 AI Summary

This work addresses the lack of transparency in the interaction between retrievers and generators in Retrieval-Augmented Generation (RAG) systems, which hinders their reliable deployment in high-stakes scenarios. The authors propose the RAG-E framework, which introduces the WARG metric to quantify the alignment between retrieved document rankings and their actual usage in generation. Furthermore, RAG-E integrates an enhanced Integrated Gradients method with PMCSHAP—a Monte Carlo-stable approximation of Shapley values—to enable end-to-end attribution analysis. Experiments on the TREC CAsT and FoodSafeSum datasets reveal that generators ignore top-ranked documents in 47.4%–66.7% of queries and rely on low-relevance documents in 48.1%–65.9% of cases, demonstrating that RAG performance critically depends on component synergy rather than the efficacy of individual modules alone.

Technology Category

Application Category

📝 Abstract

Retrieval-Augmented Generation (RAG) systems combine dense retrievers and language models to ground LLM outputs in retrieved documents. However, the opacity of how these components interact creates challenges for deployment in high-stakes domains. We present RAG-E, an end-to-end explainability framework that quantifies retriever-generator alignment through mathematically grounded attribution methods. Our approach adapts Integrated Gradients for retriever analysis, introduces PMCSHAP, a Monte Carlo-stabilized Shapley Value approximation, for generator attribution, and introduces the Weighted Attribution-Relevance Gap (WARG) metric to measure how well a generator's document usage aligns with a retriever's ranking. Empirical analysis on TREC CAsT and FoodSafeSum reveals critical misalignments: for 47.4% to 66.7% of queries, generators ignore the retriever's top-ranked documents, while 48.1% to 65.9% rely on documents ranked as less relevant. These failure modes demonstrate that RAG output quality depends not solely on individual component performance but on their interplay, which can be audited via RAG-E.

Problem

Research questions and friction points this paper is trying to address.

Retrieval-Augmented Generation

retriever-generator alignment

failure modes

attribution

explainability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Augmented Generation

Explainability

Integrated Gradients