Beyond Text: Unveiling Privacy Vulnerabilities in Multi-modal Retrieval-Augmented Generation

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work first systematically exposes privacy leakage risks in multimodal retrieval-augmented generation (MRAG) systems across vision-language and speech-language scenarios. Addressing the gap in existing RAG privacy research—largely confined to text-only modalities—we propose the first cross-modal privacy threat taxonomy for MRAG. We further design the first black-box, compositional, structured prompt attack tailored to multimodal RAG, empirically uncovering two distinct leakage pathways in large multimodal models (LMMs): “direct reproduction” and “semantic inference.” Evaluations across multiple mainstream MRAG systems demonstrate the attack’s effectiveness: private content—including image captions and speech transcriptions—is successfully extracted, with a maximum leakage rate of 78.3%. Our analysis reveals that current MRAG systems widely lack coordinated, cross-modal privacy safeguards. These findings underscore the urgent need to develop robust, privacy-preserving MRAG frameworks capable of mitigating leakage across heterogeneous modalities.

Technology Category

Application Category

📝 Abstract

Multimodal Retrieval-Augmented Generation (MRAG) systems enhance LMMs by integrating external multimodal databases, but introduce unexplored privacy vulnerabilities. While text-based RAG privacy risks have been studied, multimodal data presents unique challenges. We provide the first systematic analysis of MRAG privacy vulnerabilities across vision-language and speech-language modalities. Using a novel compositional structured prompt attack in a black-box setting, we demonstrate how attackers can extract private information by manipulating queries. Our experiments reveal that LMMs can both directly generate outputs resembling retrieved content and produce descriptions that indirectly expose sensitive information, highlighting the urgent need for robust privacy-preserving MRAG techniques.

Problem

Research questions and friction points this paper is trying to address.

Analyzing privacy risks in multimodal retrieval-augmented generation systems

Exploring vulnerabilities in vision-language and speech-language MRAG modalities

Demonstrating private data leakage via structured prompt attacks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Compositional structured prompt attack technique

Black-box query manipulation for data extraction

Systematic privacy analysis across multimodal systems

🔎 Similar Papers

No similar papers found.

Authors to Follow