RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models

📅 2025-04-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper identifies a systemic security degradation effect of Retrieval-Augmented Generation (RAG) on large language models (LLMs)—a phenomenon persisting even when both the LLM and retrieved documents are explicitly safety-aligned. Through multi-dimensional safety evaluations across 11 mainstream LLMs under RAG and non-RAG configurations, the study empirically demonstrates that RAG increases the average probability of harmful outputs and induces structural shifts in risk distribution. Moreover, standard red-teaming methodologies exhibit a 37% average decline in detection efficacy within RAG settings, revealing critical transfer failure. The contributions are threefold: (1) the first systematic empirical evidence that RAG can *undermine*, rather than enhance, LLM safety; (2) identification of a RAG-specific vulnerability wherein “safe model + safe context ≠ safe output”; and (3) a call for—and foundational rationale toward—developing RAG-native safety evaluation paradigms.

Technology Category

Application Category

📝 Abstract
Efforts to ensure the safety of large language models (LLMs) include safety fine-tuning, evaluation, and red teaming. However, despite the widespread use of the Retrieval-Augmented Generation (RAG) framework, AI safety work focuses on standard LLMs, which means we know little about how RAG use cases change a model's safety profile. We conduct a detailed comparative analysis of RAG and non-RAG frameworks with eleven LLMs. We find that RAG can make models less safe and change their safety profile. We explore the causes of this change and find that even combinations of safe models with safe documents can cause unsafe generations. In addition, we evaluate some existing red teaming methods for RAG settings and show that they are less effective than when used for non-RAG settings. Our work highlights the need for safety research and red-teaming methods specifically tailored for RAG LLMs.
Problem

Research questions and friction points this paper is trying to address.

Analyzes safety risks of Retrieval-Augmented Generation in LLMs
Compares RAG and non-RAG frameworks across eleven LLMs
Evaluates effectiveness of red teaming methods for RAG
Innovation

Methods, ideas, or system contributions that make the work stand out.

Comparative analysis of RAG and non-RAG frameworks
Evaluating red teaming methods for RAG settings
Investigating safe model and document combinations
🔎 Similar Papers
No similar papers found.