Rescuing the Unpoisoned: Efficient Defense against Knowledge Corruption Attacks on RAG Systems

📅 2025-11-03

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Retrieval-augmented generation (RAG) systems are vulnerable to knowledge contamination attacks—malicious data poisoning that induces erroneous model outputs. To address this, this paper proposes a lightweight, training-free post-retrieval defense mechanism. Without requiring additional model training or inference overhead, the method performs semantic consistency analysis and anomaly detection directly on retrieved candidate passages, leveraging efficient, lightweight machine learning techniques to identify and filter contaminated content. Its core innovation lies in achieving precise adversarial text detection with zero parameter fine-tuning and zero added inference latency. Experiments demonstrate that, under a 4× malicious interference intensity, the attack success rate (ASR) of Gemini drops significantly from 0.89 to 0.02—a substantial improvement over existing defenses—while preserving RAG’s efficiency and accuracy.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) are reshaping numerous facets of our daily lives, leading widespread adoption as web-based services. Despite their versatility, LLMs face notable challenges, such as generating hallucinated content and lacking access to up-to-date information. Lately, to address such limitations, Retrieval-Augmented Generation (RAG) has emerged as a promising direction by generating responses grounded in external knowledge sources. A typical RAG system consists of i) a retriever that probes a group of relevant passages from a knowledge base and ii) a generator that formulates a response based on the retrieved content. However, as with other AI systems, recent studies demonstrate the vulnerability of RAG, such as knowledge corruption attacks by injecting misleading information. In response, several defense strategies have been proposed, including having LLMs inspect the retrieved passages individually or fine-tuning robust retrievers. While effective, such approaches often come with substantial computational costs. In this work, we introduce RAGDefender, a resource-efficient defense mechanism against knowledge corruption (i.e., by data poisoning) attacks in practical RAG deployments. RAGDefender operates during the post-retrieval phase, leveraging lightweight machine learning techniques to detect and filter out adversarial content without requiring additional model training or inference. Our empirical evaluations show that RAGDefender consistently outperforms existing state-of-the-art defenses across multiple models and adversarial scenarios: e.g., RAGDefender reduces the attack success rate (ASR) against the Gemini model from 0.89 to as low as 0.02, compared to 0.69 for RobustRAG and 0.24 for Discern-and-Answer when adversarial passages outnumber legitimate ones by a factor of four (4x).

Problem

Research questions and friction points this paper is trying to address.

Defending RAG systems against knowledge corruption attacks

Detecting adversarial content in retrieved passages efficiently

Reducing computational costs of existing defense strategies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight ML detects adversarial content post-retrieval

Filters poisoned knowledge without model retraining

Uses efficient defense against data poisoning attacks

🔎 Similar Papers

On the Vulnerability of Applying Retrieval-Augmented Generation within Knowledge-Intensive Application Domains