Detecting RAG Extraction Attack via Dual-Path Runtime Integrity Game

📅 2026-04-12

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

This work addresses the vulnerability of Retrieval-Augmented Generation (RAG) systems to adversarial prompt attacks that can lead to the leakage of proprietary knowledge. To counter this threat, the authors propose CanaryRAG—the first runtime defense mechanism that integrates canary tokens into retrieved passages to enable real-time detection and mitigation of adaptive suppression and obfuscation attacks. By embedding canary tokens within retrieval blocks and employing a dual-path integrity verification framework, CanaryRAG operates without requiring model retraining or architectural modifications. Experimental results demonstrate that CanaryRAG substantially reduces the recovery rate of sensitive knowledge chunks compared to state-of-the-art defenses, while imposing negligible overhead on task performance and inference latency.

Technology Category

Application Category

📝 Abstract

Retrieval-Augmented Generation (RAG) systems augment large language models with external knowledge, yet introduce a critical security vulnerability: RAG Knowledge Base Leakage, wherein adversarial prompts can induce the model to divulge retrieved proprietary content. Recent studies reveal that such leakage can be executed through adaptive and iterative attack strategies (named RAG extraction attack), while effective countermeasures remain notably lacking. To bridge this gap, we propose CanaryRAG, a runtime defense mechanism inspired by stack canaries in software security. CanaryRAG embeds carefully designed canary tokens into retrieved chunks and reformulates RAG extraction defense as a dual-path runtime integrity game. Leakage is detected in real time whenever either the target or oracle path violates its expected canary behavior, including under adaptive suppression and obfuscation. Extensive evaluations against existing attacks demonstrate that CanaryRAG provides robust defense, achieving substantially lower chunk recovery rates than state-of-the-art baselines while imposing negligible impact on task performance and inference latency. Moreover, as a plug-and-play solution, CanaryRAG can be seamlessly integrated into arbitrary RAG pipelines without requiring retraining or structural modifications, offering a practical and scalable safeguard for proprietary data.

Problem

Research questions and friction points this paper is trying to address.

RAG extraction attack

Knowledge Base Leakage

Retrieval-Augmented Generation

adversarial prompts

security vulnerability

Innovation

Methods, ideas, or system contributions that make the work stand out.

RAG extraction attack

runtime integrity

canary tokens