Medusa: Cross-Modal Transferable Adversarial Attacks on Multimodal Medical Retrieval-Augmented Generation

📅 2025-11-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work exposes a cross-modal adversarial vulnerability in black-box multimodal medical retrieval-augmented generation (MMed-RAG) systems: visual input perturbations can corrupt image–text retrieval alignment, undermining clinical decision reliability. To address this, we propose Medusa—a novel framework featuring a multi-positive InfoNCE loss and a dual-loop optimization strategy, integrated with invariant risk minimization (IRM) to enhance cross-model transferability. Medusa enables efficient black-box attacks via surrogate model ensembling and co-optimization of adversarial perturbations. Evaluated on real-world medical datasets, it achieves a mean attack success rate of 90.7%, substantially outperforming existing methods. Moreover, it demonstrates strong robustness against four major defense paradigms—including input sanitization, adversarial training, feature denoising, and certified defenses. Our work establishes a new paradigm for security assessment and mitigation in medical AI systems.

Technology Category

Application Category

📝 Abstract
With the rapid advancement of retrieval-augmented vision-language models, multimodal medical retrieval-augmented generation (MMed-RAG) systems are increasingly adopted in clinical decision support. These systems enhance medical applications by performing cross-modal retrieval to integrate relevant visual and textual evidence for tasks, e.g., report generation and disease diagnosis. However, their complex architecture also introduces underexplored adversarial vulnerabilities, particularly via visual input perturbations. In this paper, we propose Medusa, a novel framework for crafting cross-modal transferable adversarial attacks on MMed-RAG systems under a black-box setting. Specifically, Medusa formulates the attack as a perturbation optimization problem, leveraging a multi-positive InfoNCE loss (MPIL) to align adversarial visual embeddings with medically plausible but malicious textual targets, thereby hijacking the retrieval process. To enhance transferability, we adopt a surrogate model ensemble and design a dual-loop optimization strategy augmented with invariant risk minimization (IRM). Extensive experiments on two real-world medical tasks, including medical report generation and disease diagnosis, demonstrate that Medusa achieves over 90% average attack success rate across various generation models and retrievers under appropriate parameter configuration, while remaining robust against four mainstream defenses, outperforming state-of-the-art baselines. Our results reveal critical vulnerabilities in the MMed-RAG systems and highlight the necessity of robustness benchmarking in safety-critical medical applications. The code and data are available at https://anonymous.4open.science/r/MMed-RAG-Attack-F05A.
Problem

Research questions and friction points this paper is trying to address.

Attacks multimodal medical AI systems via adversarial visual perturbations
Hijacks retrieval process by aligning images with malicious text targets
Exposes vulnerabilities in clinical decision support systems' security
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses multi-positive InfoNCE loss for embedding alignment
Employs surrogate model ensemble for attack transferability
Implements dual-loop optimization with invariant risk minimization
🔎 Similar Papers
No similar papers found.
Y
Yingjia Shang
Westlake University and Heilongjiang University
Y
Yi Liu
City University of Hong Kong
H
Huimin Wang
Tencent
F
Furong Li
Westlake University
Wenfang Sun
Wenfang Sun
University of Amsterdam & Westlake University & USTC
VLMLLMLMMMeta-learning
W
Wu Chengyu
Westlake University
Yefeng Zheng
Yefeng Zheng
Professor, Westlake University, Hangzhou, China, IEEE Fellow, AIMBE Fellow
AI in HealthMedical ImagingComputer VisionNatural Language ProcessingLarge Language Model