🤖 AI Summary
This work addresses the insufficient robustness of existing medical vision-language models (VLMs) against transferable adversarial attacks, noting that current methods are often perceptible or clinically implausible. The authors propose MedFocusLeak, a black-box, highly transferable multimodal adversarial attack that, for the first time, reveals the sensitivity of medical VLMs to perturbations in non-diagnostic background regions. By leveraging an attention-guided background perturbation strategy, MedFocusLeak induces models to generate clinically plausible yet erroneous diagnoses while maintaining imperceptible image modifications. The method integrates multimodal alignment perturbations with background-targeted optimization, achieving state-of-the-art attack performance across six medical imaging modalities. Furthermore, the study introduces a unified evaluation framework and novel metrics that jointly assess attack success rate and image fidelity, thereby exposing critical vulnerabilities in the clinical reasoning capabilities of current medical VLMs.
📝 Abstract
Vision-Language Models (VLMs) are increasingly used in clinical diagnostics, yet their robustness to adversarial attacks remains largely unexplored, posing serious risks. Existing medical attacks focus on secondary objectives such as model stealing or adversarial fine-tuning, while transferable attacks from natural images introduce visible distortions that clinicians can easily detect. To address this, we propose MedFocusLeak, a highly transferable black-box multimodal attack that induces incorrect yet clinically plausible diagnoses while keeping perturbations imperceptible. The method injects coordinated perturbations into non-diagnostic background regions and employs an attention distraction mechanism to shift the model's focus away from pathological areas. Extensive evaluations across six medical imaging modalities show that MedFocusLeak achieves state-of-the-art performance, generating misleading yet realistic diagnostic outputs across diverse VLMs. We further introduce a unified evaluation framework with novel metrics that jointly capture attack success and image fidelity, revealing a critical weakness in the reasoning capabilities of modern clinical VLMs.