GRADA: Graph-based Reranker against Adversarial Documents Attack

📅 2025-05-12

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

Retrieval-augmented generation (RAG) systems are vulnerable to adversarial document attacks—semantically similar yet malicious documents that manipulate retrieval outputs. Method: We propose a lightweight, general-purpose graph-based re-ranking framework. It models the retrieved document set as a similarity graph and leverages graph neural network–inspired construction to identify low-connectivity anomalous nodes (i.e., adversarial documents) via node-wise local connectivity disparity, enabling unsupervised filtering. Crucially, our method requires no modification to the retriever or large language model (LLM). Contribution/Results: The framework is compatible with mainstream LLMs—including GPT, Llama, and Qwen—and achieves an 80% reduction in attack success rate on the Natural Questions dataset, while preserving original retrieval accuracy. This significantly enhances RAG robustness and practical deployability without compromising performance.

Technology Category

Application Category

📝 Abstract

Retrieval Augmented Generation (RAG) frameworks improve the accuracy of large language models (LLMs) by integrating external knowledge from retrieved documents, thereby overcoming the limitations of models' static intrinsic knowledge. However, these systems are susceptible to adversarial attacks that manipulate the retrieval process by introducing documents that are adversarial yet semantically similar to the query. Notably, while these adversarial documents resemble the query, they exhibit weak similarity to benign documents in the retrieval set. Thus, we propose a simple yet effective Graph-based Reranking against Adversarial Document Attacks (GRADA) framework aiming at preserving retrieval quality while significantly reducing the success of adversaries. Our study evaluates the effectiveness of our approach through experiments conducted on five LLMs: GPT-3.5-Turbo, GPT-4o, Llama3.1-8b, Llama3.1-70b, and Qwen2.5-7b. We use three datasets to assess performance, with results from the Natural Questions dataset demonstrating up to an 80% reduction in attack success rates while maintaining minimal loss in accuracy.

Problem

Research questions and friction points this paper is trying to address.

Adversarial documents manipulate RAG retrieval process

GRADA framework reduces adversarial attack success rates

Maintains retrieval accuracy while defending against attacks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-based reranking to counter adversarial documents

Preserves retrieval quality while reducing attack success

Evaluated on multiple LLMs showing 80% attack reduction

🔎 Similar Papers

Bridging Social Media and Search Engines: Dredge Words and the Detection of Unreliable Domains