External Data Extraction Attacks against Retrieval-Augmented Large Language Models

📅 2025-10-03

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Retrieval-augmented large language models (RAG) are vulnerable to External Data Extraction Attacks (EDEA), wherein adversaries can extract verbatim sensitive or copyright-protected content from private knowledge bases. Method: This paper formally defines EDEA for the first time and proposes a unified attack framework integrating extraction instructions, jailbreaking prompts, and retrieval triggers. It introduces an LLM-optimizer-based adaptive jailbreaking prompt generation mechanism and a hybrid trigger strategy combining global exploration with local clustering. Contribution/Results: Evaluated across 16 diverse RAG instances spanning multiple LLM backbones, our method significantly improves extraction efficiency: it achieves the first successful extraction of 35% of the original knowledge base content from a Claude 3.7 Sonnet–powered RAG system—substantially outperforming prior approaches. This demonstrates a critical, previously underappreciated data leakage vulnerability in real-world RAG deployments.

Technology Category

Application Category

📝 Abstract

In recent years, RAG has emerged as a key paradigm for enhancing large language models (LLMs). By integrating externally retrieved information, RAG alleviates issues like outdated knowledge and, crucially, insufficient domain expertise. While effective, RAG introduces new risks of external data extraction attacks (EDEAs), where sensitive or copyrighted data in its knowledge base may be extracted verbatim. These risks are particularly acute when RAG is used to customize specialized LLM applications with private knowledge bases. Despite initial studies exploring these risks, they often lack a formalized framework, robust attack performance, and comprehensive evaluation, leaving critical questions about real-world EDEA feasibility unanswered. In this paper, we present the first comprehensive study to formalize EDEAs against retrieval-augmented LLMs. We first formally define EDEAs and propose a unified framework decomposing their design into three components: extraction instruction, jailbreak operator, and retrieval trigger, under which prior attacks can be considered instances within our framework. Guided by this framework, we develop SECRET: a Scalable and EffeCtive exteRnal data Extraction aTtack. Specifically, SECRET incorporates (1) an adaptive optimization process using LLMs as optimizers to generate specialized jailbreak prompts for EDEAs, and (2) cluster-focused triggering, an adaptive strategy that alternates between global exploration and local exploitation to efficiently generate effective retrieval triggers. Extensive evaluations across 4 models reveal that SECRET significantly outperforms previous attacks, and is highly effective against all 16 tested RAG instances. Notably, SECRET successfully extracts 35% of the data from RAG powered by Claude 3.7 Sonnet for the first time, whereas other attacks yield 0% extraction. Our findings call for attention to this emerging threat.

Problem

Research questions and friction points this paper is trying to address.

Formalizing external data extraction attacks against retrieval-augmented LLMs

Developing scalable attacks to extract sensitive data from RAG systems

Evaluating extraction feasibility across multiple models and RAG instances

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive optimization using LLMs as optimizers

Cluster-focused triggering with global and local strategies

Unified framework defining extraction instruction and jailbreak components

🔎 Similar Papers

"Yes, My LoRD."Guiding Language Model Extraction with Locality Reinforced Distillation