External Data Extraction Attacks against Retrieval-Augmented Large Language Models

📅 2025-10-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Retrieval-augmented large language models (RAG) are vulnerable to External Data Extraction Attacks (EDEA), wherein adversaries can extract verbatim sensitive or copyright-protected content from private knowledge bases. Method: This paper formally defines EDEA for the first time and proposes a unified attack framework integrating extraction instructions, jailbreaking prompts, and retrieval triggers. It introduces an LLM-optimizer-based adaptive jailbreaking prompt generation mechanism and a hybrid trigger strategy combining global exploration with local clustering. Contribution/Results: Evaluated across 16 diverse RAG instances spanning multiple LLM backbones, our method significantly improves extraction efficiency: it achieves the first successful extraction of 35% of the original knowledge base content from a Claude 3.7 Sonnet–powered RAG system—substantially outperforming prior approaches. This demonstrates a critical, previously underappreciated data leakage vulnerability in real-world RAG deployments.

Technology Category

Application Category

📝 Abstract
In recent years, RAG has emerged as a key paradigm for enhancing large language models (LLMs). By integrating externally retrieved information, RAG alleviates issues like outdated knowledge and, crucially, insufficient domain expertise. While effective, RAG introduces new risks of external data extraction attacks (EDEAs), where sensitive or copyrighted data in its knowledge base may be extracted verbatim. These risks are particularly acute when RAG is used to customize specialized LLM applications with private knowledge bases. Despite initial studies exploring these risks, they often lack a formalized framework, robust attack performance, and comprehensive evaluation, leaving critical questions about real-world EDEA feasibility unanswered. In this paper, we present the first comprehensive study to formalize EDEAs against retrieval-augmented LLMs. We first formally define EDEAs and propose a unified framework decomposing their design into three components: extraction instruction, jailbreak operator, and retrieval trigger, under which prior attacks can be considered instances within our framework. Guided by this framework, we develop SECRET: a Scalable and EffeCtive exteRnal data Extraction aTtack. Specifically, SECRET incorporates (1) an adaptive optimization process using LLMs as optimizers to generate specialized jailbreak prompts for EDEAs, and (2) cluster-focused triggering, an adaptive strategy that alternates between global exploration and local exploitation to efficiently generate effective retrieval triggers. Extensive evaluations across 4 models reveal that SECRET significantly outperforms previous attacks, and is highly effective against all 16 tested RAG instances. Notably, SECRET successfully extracts 35% of the data from RAG powered by Claude 3.7 Sonnet for the first time, whereas other attacks yield 0% extraction. Our findings call for attention to this emerging threat.
Problem

Research questions and friction points this paper is trying to address.

Formalizing external data extraction attacks against retrieval-augmented LLMs
Developing scalable attacks to extract sensitive data from RAG systems
Evaluating extraction feasibility across multiple models and RAG instances
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive optimization using LLMs as optimizers
Cluster-focused triggering with global and local strategies
Unified framework defining extraction instruction and jailbreak components
Y
Yu He
State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, 310027, China, and also with the Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security, Hangzhou, 310051, China
Y
Yifei Chen
State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, 310027, China, and also with the Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security, Hangzhou, 310051, China
Y
Yiming Li
College of Computing and Data Science, Nanyang Technological University, Singapore, 639798, Singapore
Shuo Shao
Shuo Shao
Zhejiang University
AI Copyright ProtectionData ProtectionLLM Safety
L
Leyi Qi
State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, 310027, China, and also with the Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security, Hangzhou, 310051, China
Boheng Li
Boheng Li
Nanyang Technological University
AI SecurityWatermarkingBackdoor AttackCopyright Protection
Dacheng Tao
Dacheng Tao
Nanyang Technological University
artificial intelligencemachine learningcomputer visionimage processingdata mining
Zhan Qin
Zhan Qin
Researcher, Zhejiang University
Data Security and PrivacyAI Security