Pirates of the RAG: Adaptively Attacking LLMs to Leak Knowledge Bases

📅 2024-12-24

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Retrieval-augmented generation (RAG) systems face significant privacy risks, as large language models (LLMs) may inadvertently leak sensitive information from private knowledge bases. Method: We propose the first fully black-box, adaptive knowledge base extraction attack framework. It requires no access to the target model’s architecture, parameters, or training data. Leveraging open-source LLMs (e.g., Llama, Mistral), our approach employs a relevance-driven query generation mechanism that reverse-engineers the RAG retrieval–generation pipeline to automatically craft adversarial queries inducing disclosure of private knowledge fragments. Contribution/Results: Unlike prior non-black-box, non-adaptive, or proprietary-model-dependent attacks, our framework achieves a 92% knowledge fragment extraction rate across diverse domain-specific RAG pipelines. It significantly improves both attack efficacy and cross-domain generalizability, establishing a new empirical benchmark for assessing RAG privacy vulnerabilities.

Technology Category

Application Category

📝 Abstract

The growing ubiquity of Retrieval-Augmented Generation (RAG) systems in several real-world services triggers severe concerns about their security. A RAG system improves the generative capabilities of a Large Language Models (LLM) by a retrieval mechanism which operates on a private knowledge base, whose unintended exposure could lead to severe consequences, including breaches of private and sensitive information. This paper presents a black-box attack to force a RAG system to leak its private knowledge base which, differently from existing approaches, is adaptive and automatic. A relevance-based mechanism and an attacker-side open-source LLM favor the generation of effective queries to leak most of the (hidden) knowledge base. Extensive experimentation proves the quality of the proposed algorithm in different RAG pipelines and domains, comparing to very recent related approaches, which turn out to be either not fully black-box, not adaptive, or not based on open-source models. The findings from our study remark the urgent need for more robust privacy safeguards in the design and deployment of RAG systems.

Problem

Research questions and friction points this paper is trying to address.

RAG Systems

Privacy Protection

Language Models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatic Information Extraction

Black-box Attack

Privacy Protection in RAG Systems

🔎 Similar Papers

Lockpicking LLMs: A Logit-Based Jailbreak Using Token-level Manipulation