Evaluating the Robustness of Retrieval-Augmented Generation to Adversarial Evidence in the Health Domain

📅 2025-09-03

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This study systematically evaluates the robustness of retrieval-augmented generation (RAG) systems in healthcare against adversarial evidence—deliberately injected false health information. We design controlled experiments using retrieval corpora categorized into beneficial, harmful, and adversarial documents, while varying user query context (consistent, neutral, or inconsistent) to quantitatively measure shifts in large language model (LLM) output alignment with ground-truth answers. Results show that a single adversarial document significantly degrades answer accuracy; however, mixing beneficial evidence effectively mitigates this degradation and preserves RAG robustness. This work is the first to empirically demonstrate that evidence composition—the relative proportion and type of retrieved documents—critically governs RAG safety in high-stakes domains. It provides foundational empirical evidence and a reproducible benchmark for trustworthy RAG design in healthcare, with all data publicly released.

Technology Category

Application Category

📝 Abstract

Retrieval augmented generation (RAG) systems provide a method for factually grounding the responses of a Large Language Model (LLM) by providing retrieved evidence, or context, as support. Guided by this context, RAG systems can reduce hallucinations and expand the ability of LLMs to accurately answer questions outside the scope of their training data. Unfortunately, this design introduces a critical vulnerability: LLMs may absorb and reproduce misinformation present in retrieved evidence. This problem is magnified if retrieved evidence contains adversarial material explicitly intended to promulgate misinformation. This paper presents a systematic evaluation of RAG robustness in the health domain and examines alignment between model outputs and ground-truth answers. We focus on the health domain due to the potential for harm caused by incorrect responses, as well as the availability of evidence-based ground truth for many common health-related questions. We conduct controlled experiments using common health questions, varying both the type and composition of the retrieved documents (helpful, harmful, and adversarial) as well as the framing of the question by the user (consistent, neutral, and inconsistent). Our findings reveal that adversarial documents substantially degrade alignment, but robustness can be preserved when helpful evidence is also present in the retrieval pool. These findings offer actionable insights for designing safer RAG systems in high-stakes domains by highlighting the need for retrieval safeguards. To enable reproducibility and facilitate future research, all experimental results are publicly available in our github repository. https://github.com/shakibaam/RAG_ROBUSTNESS_EVAL

Problem

Research questions and friction points this paper is trying to address.

Evaluating RAG robustness to adversarial evidence in health domain

Assessing misinformation impact on LLM outputs from retrieved documents

Examining alignment between model responses and ground-truth answers

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluated RAG robustness against adversarial evidence

Used controlled experiments with varied document types

Highlighted need for retrieval safeguards in RAG

🔎 Similar Papers

On the Vulnerability of Applying Retrieval-Augmented Generation within Knowledge-Intensive Application Domains