The Distracting Effect: Understanding Irrelevant Passages in RAG

📅 2025-05-11

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

This work addresses the problem of irrelevant retrieved passages degrading large language model (LLM) performance in retrieval-augmented generation (RAG). We propose, for the first time, a quantifiable metric for the “interference effect,” systematically distinguishing between “completely irrelevant” and “high-interference” passages. Methodologically, we quantify interference by analyzing LLM response bias induced by retrieved content, identify hard-interference passages via adversarial retrieval mining, and design a targeted fine-tuning strategy to explicitly leverage such high-interference examples. Our key contribution is the first framework that both identifies and productively utilizes high-interference passages, significantly enhancing RAG robustness. Empirical evaluation across multiple LLMs demonstrates that our interference metric exhibits strong cross-model robustness, and the proposed fine-tuning improves answer accuracy by up to 7.5% over baseline RAG systems.

Technology Category

Application Category

📝 Abstract

A well-known issue with Retrieval Augmented Generation (RAG) is that retrieved passages that are irrelevant to the query sometimes distract the answer-generating LLM, causing it to provide an incorrect response. In this paper, we shed light on this core issue and formulate the distracting effect of a passage w.r.t. a query (and an LLM). We provide a quantifiable measure of the distracting effect of a passage and demonstrate its robustness across LLMs. Our research introduces novel methods for identifying and using hard distracting passages to improve RAG systems. By fine-tuning LLMs with these carefully selected distracting passages, we achieve up to a 7.5% increase in answering accuracy compared to counterparts fine-tuned on conventional RAG datasets. Our contribution is two-fold: first, we move beyond the simple binary classification of irrelevant passages as either completely unrelated vs. distracting, and second, we develop and analyze multiple methods for finding hard distracting passages. To our knowledge, no other research has provided such a comprehensive framework for identifying and utilizing hard distracting passages.

Problem

Research questions and friction points this paper is trying to address.

Quantifying distracting effect of irrelevant passages in RAG

Improving RAG accuracy using hard distracting passages

Developing methods to identify hard distracting passages

Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantifies distracting effect of irrelevant passages

Fine-tunes LLMs with hard distracting passages

Develops methods to identify hard distracting passages

🔎 Similar Papers

LRP4RAG: Detecting Hallucinations in Retrieval-Augmented Generation via Layer-wise Relevance Propagation