ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search

📅 2025-09-27

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

Retrieval-Augmented Generation (RAG) systems are vulnerable to corpus-level attacks—such as prompt injection—especially in real-world settings like web search, necessitating robust defenses. This paper proposes a novel defense framework grounded in document reliability. First, it models inter-document contradictions as a graph and applies a maximum independent set algorithm to select a highly consistent, reliable subset of documents, yielding provably attack-resilient retrieval. Second, it integrates intrinsic reliability signals—including retrieval ranking scores—into a scalable weighted sampling and aggregation mechanism. Experiments demonstrate that our approach significantly outperforms existing defenses against adversarial attacks while preserving high accuracy on benign queries. Moreover, it exhibits superior robustness and generalization in long-text generation tasks, confirming its effectiveness across diverse threat models and application scenarios.

Technology Category

Application Category

📝 Abstract

Retrieval-Augmented Generation (RAG) enhances Large Language Models by grounding their outputs in external documents. These systems, however, remain vulnerable to attacks on the retrieval corpus, such as prompt injection. RAG-based search systems (e.g., Google's Search AI Overview) present an interesting setting for studying and protecting against such threats, as defense algorithms can benefit from built-in reliability signals -- like document ranking -- and represent a non-LLM challenge for the adversary due to decades of work to thwart SEO. Motivated by, but not limited to, this scenario, this work introduces ReliabilityRAG, a framework for adversarial robustness that explicitly leverages reliability information of retrieved documents. Our first contribution adopts a graph-theoretic perspective to identify a "consistent majority" among retrieved documents to filter out malicious ones. We introduce a novel algorithm based on finding a Maximum Independent Set (MIS) on a document graph where edges encode contradiction. Our MIS variant explicitly prioritizes higher-reliability documents and provides provable robustness guarantees against bounded adversarial corruption under natural assumptions. Recognizing the computational cost of exact MIS for large retrieval sets, our second contribution is a scalable weighted sample and aggregate framework. It explicitly utilizes reliability information, preserving some robustness guarantees while efficiently handling many documents. We present empirical results showing ReliabilityRAG provides superior robustness against adversarial attacks compared to prior methods, maintains high benign accuracy, and excels in long-form generation tasks where prior robustness-focused methods struggled. Our work is a significant step towards more effective, provably robust defenses against retrieved corpus corruption in RAG.

Problem

Research questions and friction points this paper is trying to address.

Defends RAG systems against adversarial corpus corruption attacks

Leverages document reliability signals to filter malicious content

Provides provable robustness guarantees while maintaining high accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages document reliability graphs for adversarial robustness

Uses maximum independent set algorithm to filter malicious documents

Implements scalable sampling framework for large document sets

🔎 Similar Papers

Is My Data in Your Retrieval Database? Membership Inference Attacks Against Retrieval Augmented Generation