The Silent Saboteur: Imperceptible Adversarial Attacks against Black-Box Retrieval-Augmented Generation Systems

📅 2025-05-24

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This work introduces the first imperceptible adversarial attack paradigm targeting the “retrieval → generation” pipeline in black-box retrieval-augmented generation (RAG) systems. To manipulate RAG outputs while remaining undetectable to humans, the authors propose ReGENT—a reinforcement learning framework that jointly optimizes three objectives: retrieval relevance, generation misleadingness, and textual naturalness. ReGENT enables end-to-end perturbation optimization under black-box constraints via differentiable retrieval approximation and inverse interaction modeling. Evaluated on a newly constructed factual/non-factual question-answering benchmark, ReGENT achieves significantly higher attack success rates than prior methods across mainstream RAG systems, using minimal text perturbations (average character-level modification rate < 0.8%). Crucially, perturbed inputs retain high naturalness and readability, ensuring stealthiness without compromising linguistic fluency.

Technology Category

Application Category

📝 Abstract

We explore adversarial attacks against retrieval-augmented generation (RAG) systems to identify their vulnerabilities. We focus on generating human-imperceptible adversarial examples and introduce a novel imperceptible retrieve-to-generate attack against RAG. This task aims to find imperceptible perturbations that retrieve a target document, originally excluded from the initial top-$k$ candidate set, in order to influence the final answer generation. To address this task, we propose ReGENT, a reinforcement learning-based framework that tracks interactions between the attacker and the target RAG and continuously refines attack strategies based on relevance-generation-naturalness rewards. Experiments on newly constructed factual and non-factual question-answering benchmarks demonstrate that ReGENT significantly outperforms existing attack methods in misleading RAG systems with small imperceptible text perturbations.

Problem

Research questions and friction points this paper is trying to address.

Identify vulnerabilities in retrieval-augmented generation systems

Generate human-imperceptible adversarial examples for RAG

Influence answer generation via imperceptible text perturbations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Human-imperceptible adversarial examples generation

Reinforcement learning-based attack framework

Relevance-generation-naturalness reward strategy

🔎 Similar Papers

No similar papers found.