The RAG Paradox: A Black-Box Attack Exploiting Unintentional Vulnerabilities in Retrieval-Augmented Generation Systems

📅 2025-02-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work identifies the “RAG paradox”: while retrieval-augmented generation (RAG) systems enhance trustworthiness through source citation, such transparency inadvertently exposes exploitable vulnerabilities to black-box attacks. To address this, we propose the first realistic black-box attack paradigm that requires no internal model access—automatically identifying publicly cited documents in RAG outputs, generating and publishing adversarial “poisoned” documents containing misleading information, and thereby corrupting both retrieval and generation. Experiments demonstrate significant degradation in response accuracy and factual consistency across diverse RAG systems. Further, we introduce an intrinsic defense based on retrieval re-ranking, which improves robustness without modifying the underlying LLM or retriever. This is the first systematic study to formally characterize, empirically validate, and mitigate black-box poisoning threats against RAG systems.

Technology Category

Application Category

📝 Abstract
With the growing adoption of retrieval-augmented generation (RAG) systems, recent studies have introduced attack methods aimed at degrading their performance. However, these methods rely on unrealistic white-box assumptions, such as attackers having access to RAG systems' internal processes. To address this issue, we introduce a realistic black-box attack scenario based on the RAG paradox, where RAG systems inadvertently expose vulnerabilities while attempting to enhance trustworthiness. Because RAG systems reference external documents during response generation, our attack targets these sources without requiring internal access. Our approach first identifies the external sources disclosed by RAG systems and then automatically generates poisoned documents with misinformation designed to match these sources. Finally, these poisoned documents are newly published on the disclosed sources, disrupting the RAG system's response generation process. Both offline and online experiments confirm that this attack significantly reduces RAG performance without requiring internal access. Furthermore, from an insider perspective within the RAG system, we propose a re-ranking method that acts as a fundamental safeguard, offering minimal protection against unforeseen attacks.
Problem

Research questions and friction points this paper is trying to address.

Exploits vulnerabilities in RAG systems without internal access.
Targets external sources to inject misinformation into RAG responses.
Proposes a re-ranking method to mitigate unforeseen black-box attacks.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Black-box attack targets RAG external sources
Generates poisoned documents with misinformation
Proposes re-ranking method for system protection