The Silent Saboteur: Imperceptible Adversarial Attacks against Black-Box Retrieval-Augmented Generation Systems

📅 2025-05-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work introduces the first imperceptible adversarial attack paradigm targeting the “retrieval → generation” pipeline in black-box retrieval-augmented generation (RAG) systems. To manipulate RAG outputs while remaining undetectable to humans, the authors propose ReGENT—a reinforcement learning framework that jointly optimizes three objectives: retrieval relevance, generation misleadingness, and textual naturalness. ReGENT enables end-to-end perturbation optimization under black-box constraints via differentiable retrieval approximation and inverse interaction modeling. Evaluated on a newly constructed factual/non-factual question-answering benchmark, ReGENT achieves significantly higher attack success rates than prior methods across mainstream RAG systems, using minimal text perturbations (average character-level modification rate < 0.8%). Crucially, perturbed inputs retain high naturalness and readability, ensuring stealthiness without compromising linguistic fluency.

Technology Category

Application Category

📝 Abstract
We explore adversarial attacks against retrieval-augmented generation (RAG) systems to identify their vulnerabilities. We focus on generating human-imperceptible adversarial examples and introduce a novel imperceptible retrieve-to-generate attack against RAG. This task aims to find imperceptible perturbations that retrieve a target document, originally excluded from the initial top-$k$ candidate set, in order to influence the final answer generation. To address this task, we propose ReGENT, a reinforcement learning-based framework that tracks interactions between the attacker and the target RAG and continuously refines attack strategies based on relevance-generation-naturalness rewards. Experiments on newly constructed factual and non-factual question-answering benchmarks demonstrate that ReGENT significantly outperforms existing attack methods in misleading RAG systems with small imperceptible text perturbations.
Problem

Research questions and friction points this paper is trying to address.

Identify vulnerabilities in retrieval-augmented generation systems
Generate human-imperceptible adversarial examples for RAG
Influence answer generation via imperceptible text perturbations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Human-imperceptible adversarial examples generation
Reinforcement learning-based attack framework
Relevance-generation-naturalness reward strategy
🔎 Similar Papers
No similar papers found.
H
Hongru Song
Key Laboratory of Network Data Science and Technology, Institute of Computing Technology, CAS; State Key Laboratory of AI Safety; University of Chinese Academy of Sciences
Y
Yu-an Liu
Key Laboratory of Network Data Science and Technology, Institute of Computing Technology, CAS; State Key Laboratory of AI Safety; University of Chinese Academy of Sciences
Ruqing Zhang
Ruqing Zhang
Institute of Computing Technology, Chinese Academy of Sciences
Information RetrievalNatural Language ProcessingLarge Language Models
Jiafeng Guo
Jiafeng Guo
Professor, Institute of Computing Techonology, CAS
Information RetrievalMachine LearningText AnalysisNeuIR
Jianming Lv
Jianming Lv
Assistant Professor, School of Computer Science and Engineering, South China University of
Security and PrivacyPeer-to-PeerData mining
M
M. D. Rijke
University of Amsterdam
Xueqi Cheng
Xueqi Cheng
Ph.D. student, Florida State University
Data miningLLMGNNComputational social science