Detecting RAG Extraction Attack via Dual-Path Runtime Integrity Game

๐Ÿ“… 2026-04-12
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

178K/year
๐Ÿค– AI Summary
This work addresses the vulnerability of Retrieval-Augmented Generation (RAG) systems to adversarial prompt attacks that can lead to the leakage of proprietary knowledge. To counter this threat, the authors propose CanaryRAGโ€”the first runtime defense mechanism that integrates canary tokens into retrieved passages to enable real-time detection and mitigation of adaptive suppression and obfuscation attacks. By embedding canary tokens within retrieval blocks and employing a dual-path integrity verification framework, CanaryRAG operates without requiring model retraining or architectural modifications. Experimental results demonstrate that CanaryRAG substantially reduces the recovery rate of sensitive knowledge chunks compared to state-of-the-art defenses, while imposing negligible overhead on task performance and inference latency.

Technology Category

Application Category

๐Ÿ“ Abstract
Retrieval-Augmented Generation (RAG) systems augment large language models with external knowledge, yet introduce a critical security vulnerability: RAG Knowledge Base Leakage, wherein adversarial prompts can induce the model to divulge retrieved proprietary content. Recent studies reveal that such leakage can be executed through adaptive and iterative attack strategies (named RAG extraction attack), while effective countermeasures remain notably lacking. To bridge this gap, we propose CanaryRAG, a runtime defense mechanism inspired by stack canaries in software security. CanaryRAG embeds carefully designed canary tokens into retrieved chunks and reformulates RAG extraction defense as a dual-path runtime integrity game. Leakage is detected in real time whenever either the target or oracle path violates its expected canary behavior, including under adaptive suppression and obfuscation. Extensive evaluations against existing attacks demonstrate that CanaryRAG provides robust defense, achieving substantially lower chunk recovery rates than state-of-the-art baselines while imposing negligible impact on task performance and inference latency. Moreover, as a plug-and-play solution, CanaryRAG can be seamlessly integrated into arbitrary RAG pipelines without requiring retraining or structural modifications, offering a practical and scalable safeguard for proprietary data.
Problem

Research questions and friction points this paper is trying to address.

RAG extraction attack
Knowledge Base Leakage
Retrieval-Augmented Generation
adversarial prompts
security vulnerability
Innovation

Methods, ideas, or system contributions that make the work stand out.

RAG extraction attack
runtime integrity
canary tokens
dual-path defense
knowledge leakage detection
Y
Yuanbo Xie
Institute of Information Engineering, Chinese Academy of Sciences, China; School of Cyber Security, University of Chinese Academy of Sciences, China
Y
Yingjie Zhang
Institute of Information Engineering, Chinese Academy of Sciences, China; School of Cyber Security, University of Chinese Academy of Sciences, China
Yulin Li
Yulin Li
The Hong Kong University of Science and Technology
Optimiation TheoryRobot Motion Planning&Control
S
Shouyou Song
Beijing University of Post and Telecommunications, China
X
Xiaokun Chen
Stanford University
Zhihan Liu
Zhihan Liu
Northwestern University
large language modelsreinforcement learningoffline learningonline learning
L
Liya Su
AI Sec Lab, Beijing Chaitin Technology Co., Ltd
Tingwen Liu
Tingwen Liu
Institute of Information Engineering, Chinese Academy of Sciences
Content SecurityNatural Language ProcessingKnowledge Graph