Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems through Benign Queries

📅 2025-05-21

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

This work identifies an implicit knowledge leakage vulnerability in Retrieval-Augmented Generation (RAG) systems under benign user queries—whereas prior attacks rely on adversarial inputs easily detectable by defenses, we propose Implicit Knowledge Extraction Attack (IKEA), the first attack that exfiltrates private knowledge base content using only legitimate queries. Methodologically, we introduce “anchor concept” modeling and an implicit extraction paradigm, integrating empirical reflection sampling and trust-region-guided mutation to iteratively optimize queries within embedding-space similarity constraints, thereby evading detection. Experiments demonstrate >80% improvement in knowledge extraction efficiency and >90% increase in attack success rate; reconstructed surrogate RAG systems significantly outperform baselines across multiple evaluations. This is the first systematic study exposing RAG’s latent privacy risks, providing critical insights for robust defense design.

Technology Category

Application Category

📝 Abstract

Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by incorporating external knowledge bases, but they are vulnerable to privacy risks from data extraction attacks. Existing extraction methods typically rely on malicious inputs such as prompt injection or jailbreaking, making them easily detectable via input- or output-level detection. In this paper, we introduce Implicit Knowledge Extraction Attack (IKEA), which conducts knowledge extraction on RAG systems through benign queries. IKEA first leverages anchor concepts to generate queries with the natural appearance, and then designs two mechanisms to lead to anchor concept thoroughly 'explore' the RAG's privacy knowledge: (1) Experience Reflection Sampling, which samples anchor concepts based on past query-response patterns to ensure the queries' relevance to RAG documents; (2) Trust Region Directed Mutation, which iteratively mutates anchor concepts under similarity constraints to further exploit the embedding space. Extensive experiments demonstrate IKEA's effectiveness under various defenses, surpassing baselines by over 80% in extraction efficiency and 90% in attack success rate. Moreover, the substitute RAG system built from IKEA's extractions consistently outperforms those based on baseline methods across multiple evaluation tasks, underscoring the significant privacy risk in RAG systems.

Problem

Research questions and friction points this paper is trying to address.

Extracting private knowledge from RAG systems using benign queries

Bypassing detection with natural-looking queries and iterative mutation

Demonstrating high extraction efficiency and attack success rates

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses benign queries for implicit knowledge extraction

Leverages anchor concepts for natural query generation

Employs reflection sampling and trust region mutation

🔎 Similar Papers

Is My Data in Your Retrieval Database? Membership Inference Attacks Against Retrieval Augmented Generation