SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model

📅 2025-01-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Retrieval-Augmented Generation (RAG) systems face severe security risks due to external knowledge injection, yet no comprehensive threat taxonomy or standardized evaluation framework exists. Method: We propose the first systematic RAG security threat taxonomy—covering silver-noise poisoning, context-conflict attacks, soft-advertising injections, and white-service denial—and build SafeRAG, the first high-quality, human-annotated benchmark for RAG security evaluation. We further design an adversarial assessment framework covering 14 mainstream RAG components. Results: Empirical evaluation reveals that state-of-the-art retrievers, filters, and large language models remain highly vulnerable to basic poisoning attacks; even minimal adversarial inputs consistently bypass existing defenses, significantly degrading generation quality and service reliability. This work establishes a foundational taxonomy, a standardized benchmark, and empirical evidence to advance systematic RAG security research.

Technology Category

Application Category

📝 Abstract
The indexing-retrieval-generation paradigm of retrieval-augmented generation (RAG) has been highly successful in solving knowledge-intensive tasks by integrating external knowledge into large language models (LLMs). However, the incorporation of external and unverified knowledge increases the vulnerability of LLMs because attackers can perform attack tasks by manipulating knowledge. In this paper, we introduce a benchmark named SafeRAG designed to evaluate the RAG security. First, we classify attack tasks into silver noise, inter-context conflict, soft ad, and white Denial-of-Service. Next, we construct RAG security evaluation dataset (i.e., SafeRAG dataset) primarily manually for each task. We then utilize the SafeRAG dataset to simulate various attack scenarios that RAG may encounter. Experiments conducted on 14 representative RAG components demonstrate that RAG exhibits significant vulnerability to all attack tasks and even the most apparent attack task can easily bypass existing retrievers, filters, or advanced LLMs, resulting in the degradation of RAG service quality. Code is available at: https://github.com/IAAR-Shanghai/SafeRAG.
Problem

Research questions and friction points this paper is trying to address.

Language Models
Retrieval-Augmented Generation (RAG)
Security Vulnerabilities
Innovation

Methods, ideas, or system contributions that make the work stand out.

SafeRAG
RAGModelSecurity
AttackVectors
🔎 Similar Papers
No similar papers found.
X
Xun Liang
School of Information, Renmin University of China, Beijing, China
S
Simin Niu
School of Information, Renmin University of China, Beijing, China
Zhiyu Li
Zhiyu Li
Tianjin University
Robust controlattitude control
S
Sensen Zhang
School of Information, Renmin University of China, Beijing, China
H
Hanyu Wang
School of Information, Renmin University of China, Beijing, China
Feiyu Xiong
Feiyu Xiong
MemTensor (Shanghai) Technology Co., Ltd.
Machine LearningNLPLLM
Jason Zhaoxin Fan
Jason Zhaoxin Fan
Beihang University; Psyche AI Inc
multi-modal LLMsavatarsand embodied AI
B
Bo Tang
Institute for Advanced Algorithms Research, Shanghai, China
S
Shichao Song
School of Information, Renmin University of China, Beijing, China
M
Mengwei Wang
School of Information, Renmin University of China, Beijing, China
J
Jiawei Yang
School of Information, Renmin University of China, Beijing, China