MrM: Black-Box Membership Inference Attacks against Multimodal RAG Systems

📅 2025-06-09

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Multimodal RAG systems enhance visual language model (VLM) performance but remain vulnerable to black-box membership inference attacks (MIAs), particularly lacking effective attack and evaluation methodologies tailored to the visual modality. To address this gap, we propose the first black-box MIA framework specifically designed for multimodal RAG systems. Our approach integrates object-aware image perturbation, counterfactual-driven mask selection, and statistical modeling of response patterns to enable high-accuracy inference of knowledge base membership. It balances retrieval controllability with maximal semantic leakage. We validate the framework across two benchmark vision datasets and eight state-of-the-art VLMs—including GPT-4o and Gemini-2—demonstrating statistically significant improvements in both sample-level and set-level attack accuracy over existing baselines. Moreover, the attack exhibits robustness against multiple adaptive defenses, underscoring its practical threat relevance.

Technology Category

Application Category

📝 Abstract

Multimodal retrieval-augmented generation (RAG) systems enhance large vision-language models by integrating cross-modal knowledge, enabling their increasing adoption across real-world multimodal tasks. These knowledge databases may contain sensitive information that requires privacy protection. However, multimodal RAG systems inherently grant external users indirect access to such data, making them potentially vulnerable to privacy attacks, particularly membership inference attacks (MIAs). % Existing MIA methods targeting RAG systems predominantly focus on the textual modality, while the visual modality remains relatively underexplored. To bridge this gap, we propose MrM, the first black-box MIA framework targeted at multimodal RAG systems. It utilizes a multi-object data perturbation framework constrained by counterfactual attacks, which can concurrently induce the RAG systems to retrieve the target data and generate information that leaks the membership information. Our method first employs an object-aware data perturbation method to constrain the perturbation to key semantics and ensure successful retrieval. Building on this, we design a counterfact-informed mask selection strategy to prioritize the most informative masked regions, aiming to eliminate the interference of model self-knowledge and amplify attack efficacy. Finally, we perform statistical membership inference by modeling query trials to extract features that reflect the reconstruction of masked semantics from response patterns. Experiments on two visual datasets and eight mainstream commercial visual-language models (e.g., GPT-4o, Gemini-2) demonstrate that MrM achieves consistently strong performance across both sample-level and set-level evaluations, and remains robust under adaptive defenses.

Problem

Research questions and friction points this paper is trying to address.

Privacy risks in multimodal RAG systems from membership inference attacks

Lack of visual-focused MIA methods for multimodal RAG systems

Black-box attack vulnerability in commercial vision-language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Black-box MIA framework for multimodal RAG

Multi-object data perturbation with counterfactual attacks

Counterfact-informed mask selection strategy

🔎 Similar Papers

Is My Data in Your Retrieval Database? Membership Inference Attacks Against Retrieval Augmented Generation

2024-05-30arXiv.orgCitations: 9