๐ค AI Summary
This work addresses the vulnerability of Retrieval-Augmented Generation (RAG) systems to adversarial prompt attacks that can lead to the leakage of proprietary knowledge. To counter this threat, the authors propose CanaryRAGโthe first runtime defense mechanism that integrates canary tokens into retrieved passages to enable real-time detection and mitigation of adaptive suppression and obfuscation attacks. By embedding canary tokens within retrieval blocks and employing a dual-path integrity verification framework, CanaryRAG operates without requiring model retraining or architectural modifications. Experimental results demonstrate that CanaryRAG substantially reduces the recovery rate of sensitive knowledge chunks compared to state-of-the-art defenses, while imposing negligible overhead on task performance and inference latency.
๐ Abstract
Retrieval-Augmented Generation (RAG) systems augment large language models with external knowledge, yet introduce a critical security vulnerability: RAG Knowledge Base Leakage, wherein adversarial prompts can induce the model to divulge retrieved proprietary content. Recent studies reveal that such leakage can be executed through adaptive and iterative attack strategies (named RAG extraction attack), while effective countermeasures remain notably lacking. To bridge this gap, we propose CanaryRAG, a runtime defense mechanism inspired by stack canaries in software security. CanaryRAG embeds carefully designed canary tokens into retrieved chunks and reformulates RAG extraction defense as a dual-path runtime integrity game. Leakage is detected in real time whenever either the target or oracle path violates its expected canary behavior, including under adaptive suppression and obfuscation. Extensive evaluations against existing attacks demonstrate that CanaryRAG provides robust defense, achieving substantially lower chunk recovery rates than state-of-the-art baselines while imposing negligible impact on task performance and inference latency. Moreover, as a plug-and-play solution, CanaryRAG can be seamlessly integrated into arbitrary RAG pipelines without requiring retraining or structural modifications, offering a practical and scalable safeguard for proprietary data.