Evaluation of Attribution Bias in Retrieval-Augmented Large Language Models

📅 2024-10-16
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the sensitivity and bias of large language models (LLMs) in retrieval-augmented generation (RAG) when attributing answers to source document author metadata. Using a counterfactual evaluation framework, we explicitly inject synthetic or AI-assigned author labels into source documents and systematically quantify attribution stability and author-type preferences across three mainstream LLMs. We introduce and formalize two novel metrics—*attribution sensitivity* and *author attribution bias*—to measure how author metadata influences model trustworthiness judgments. Results reveal that author metadata significantly affects attribution quality, inducing fluctuations of 3%–18%, and that LLMs exhibit a strong preference for text labeled as “human-authored”—contrary to the prevailing assumption that LLMs favor AI-generated content. This study extends the analysis of LLM vulnerabilities to metadata-driven biases and provides empirical grounding and methodological tools for designing trustworthy RAG systems.

Technology Category

Application Category

📝 Abstract
Attributing answers to source documents is an approach used to enhance the verifiability of a model's output in retrieval augmented generation (RAG). Prior work has mainly focused on improving and evaluating the attribution quality of large language models (LLMs) in RAG, but this may come at the expense of inducing biases in the attribution of answers. We define and examine two aspects in the evaluation of LLMs in RAG pipelines, namely attribution sensitivity and bias with respect to authorship information. We explicitly inform an LLM about the authors of source documents, instruct it to attribute its answers, and analyze (i) how sensitive the LLM's output is to the author of source documents, and (ii) whether the LLM exhibits a bias towards human-written or AI-generated source documents. We design an experimental setup in which we use counterfactual evaluation to study three LLMs in terms of their attribution sensitivity and bias in RAG pipelines. Our results show that adding authorship information to source documents can significantly change the attribution quality of LLMs by 3% to 18%. Moreover, we show that LLMs can have an attribution bias towards explicit human authorship, which can serve as a competing hypothesis for findings of prior work that shows that LLM-generated content may be preferred over human-written contents. Our findings indicate that metadata of source documents can influence LLMs' trust, and how they attribute their answers. Furthermore, our research highlights attribution bias and sensitivity as a novel aspect of brittleness in LLMs.
Problem

Research questions and friction points this paper is trying to address.

Evaluating attribution bias in LLMs with authorship metadata
Assessing sensitivity and bias in RAG pipeline attributions
Measuring impact of document metadata on LLM trust
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates attribution sensitivity and bias in LLMs
Uses counterfactual evaluation with authorship metadata
Measures bias towards human vs AI-generated sources
🔎 Similar Papers
No similar papers found.