Beyond Text: Unveiling Privacy Vulnerabilities in Multi-modal Retrieval-Augmented Generation

📅 2025-05-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work first systematically exposes privacy leakage risks in multimodal retrieval-augmented generation (MRAG) systems across vision-language and speech-language scenarios. Addressing the gap in existing RAG privacy research—largely confined to text-only modalities—we propose the first cross-modal privacy threat taxonomy for MRAG. We further design the first black-box, compositional, structured prompt attack tailored to multimodal RAG, empirically uncovering two distinct leakage pathways in large multimodal models (LMMs): “direct reproduction” and “semantic inference.” Evaluations across multiple mainstream MRAG systems demonstrate the attack’s effectiveness: private content—including image captions and speech transcriptions—is successfully extracted, with a maximum leakage rate of 78.3%. Our analysis reveals that current MRAG systems widely lack coordinated, cross-modal privacy safeguards. These findings underscore the urgent need to develop robust, privacy-preserving MRAG frameworks capable of mitigating leakage across heterogeneous modalities.

Technology Category

Application Category

📝 Abstract
Multimodal Retrieval-Augmented Generation (MRAG) systems enhance LMMs by integrating external multimodal databases, but introduce unexplored privacy vulnerabilities. While text-based RAG privacy risks have been studied, multimodal data presents unique challenges. We provide the first systematic analysis of MRAG privacy vulnerabilities across vision-language and speech-language modalities. Using a novel compositional structured prompt attack in a black-box setting, we demonstrate how attackers can extract private information by manipulating queries. Our experiments reveal that LMMs can both directly generate outputs resembling retrieved content and produce descriptions that indirectly expose sensitive information, highlighting the urgent need for robust privacy-preserving MRAG techniques.
Problem

Research questions and friction points this paper is trying to address.

Analyzing privacy risks in multimodal retrieval-augmented generation systems
Exploring vulnerabilities in vision-language and speech-language MRAG modalities
Demonstrating private data leakage via structured prompt attacks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Compositional structured prompt attack technique
Black-box query manipulation for data extraction
Systematic privacy analysis across multimodal systems
🔎 Similar Papers
No similar papers found.
J
Jiankun Zhang
Jilin University
Shenglai Zeng
Shenglai Zeng
Michigan State University
Large language modelsRetrieval-augmented GenerationInformation retrievalAI safety
J
Jie Ren
Michigan State University
T
Tianqi Zheng
Amazon.com
H
Hui Liu
Amazon.com
Xianfeng Tang
Xianfeng Tang
Amazon
Machine LearningLarge Language Models
Y
Yi Chang
Michigan State University