Source Attribution in Retrieval-Augmented Generation

📅 2025-07-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Document attribution in RAG systems suffers from high computational overhead and poor interpretability. Method: This paper pioneers the systematic extension of Shapley values to document-level attribution, proposing a low-overhead approximation algorithm that reliably quantifies document importance under redundancy, complementarity, and synergy. Integrating the SHAP framework with LLM-based utility evaluation, it designs a lightweight, controllable utility function via targeted LLM interactions—bypassing exhaustive subset enumeration. Contribution/Results: Experiments demonstrate that the approximation reduces LLM calls by over 80% while preserving high fidelity in identifying critical documents. The resulting attributions are both highly interpretable and practically deployable. This work establishes the first efficient, theoretically grounded, and empirically validated document-level attribution paradigm for interpretable RAG.

Technology Category

Application Category

📝 Abstract
While attribution methods, such as Shapley values, are widely used to explain the importance of features or training data in traditional machine learning, their application to Large Language Models (LLMs), particularly within Retrieval-Augmented Generation (RAG) systems, is nascent and challenging. The primary obstacle is the substantial computational cost, where each utility function evaluation involves an expensive LLM call, resulting in direct monetary and time expenses. This paper investigates the feasibility and effectiveness of adapting Shapley-based attribution to identify influential retrieved documents in RAG. We compare Shapley with more computationally tractable approximations and some existing attribution methods for LLM. Our work aims to: (1) systematically apply established attribution principles to the RAG document-level setting; (2) quantify how well SHAP approximations can mirror exact attributions while minimizing costly LLM interactions; and (3) evaluate their practical explainability in identifying critical documents, especially under complex inter-document relationships such as redundancy, complementarity, and synergy. This study seeks to bridge the gap between powerful attribution techniques and the practical constraints of LLM-based RAG systems, offering insights into achieving reliable and affordable RAG explainability.
Problem

Research questions and friction points this paper is trying to address.

Adapt Shapley-based attribution for RAG document influence
Compare Shapley with approximations to reduce LLM costs
Evaluate explainability under complex document relationships
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapt Shapley-based attribution for RAG
Compare Shapley with tractable approximations
Evaluate explainability under document relationships
🔎 Similar Papers
No similar papers found.
I
Ikhtiyor Nematov
Université Libre de Bruxelles, Belgium
T
Tarik Kalai
Université Libre de Bruxelles, Belgium
E
Elizaveta Kuzmenko
Université Libre de Bruxelles, Belgium
G
Gabriele Fugagnoli
Université Libre de Bruxelles, Belgium; University of Padova, Italy
Dimitris Sacharidis
Dimitris Sacharidis
Université Libre de Bruxelles (ULB)
Responsible AI
Katja Hose
Katja Hose
Professor, TU Wien
Graph Data ManagmeentKnowledge GraphsDatabasesSemantic Web
Tomer Sagi
Tomer Sagi
Associate Professor, Aalborg University
Schema MatchingEntity ResolutionData Cleansing