Improving Adversarial Transferability in MLLMs via Dynamic Vision-Language Alignment Attack

📅 2025-02-27

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Multimodal large language models (MLLMs) exhibit poor transferability in cross-model adversarial attacks, primarily due to heterogeneous vision-language alignment mechanisms. To address this, we propose Dynamic Vision-Language Alignment Attack (DyVLA), the first method to inject gradient-driven dynamic perturbations into the vision-language connector—adapting adversarial inputs to diverse MLLM alignment architectures. DyVLA jointly models modality alignment and incorporates a black-box transfer evaluation framework. Evaluated on both open-source (BLIP-2, InstructBLIP, MiniGPT-4, LLaVA) and closed-source (Gemini) models, it achieves an average 32.7% improvement in target attack transfer success rate. Its core innovation lies in departing from static perturbation paradigms: by applying dynamic, connector-level perturbations, DyVLA effectively decouples architectural differences in vision-language alignment, thereby significantly enhancing cross-architectural adversarial generalization.

Technology Category

Application Category

📝 Abstract

Multimodal Large Language Models (MLLMs), built upon LLMs, have recently gained attention for their capabilities in image recognition and understanding. However, while MLLMs are vulnerable to adversarial attacks, the transferability of these attacks across different models remains limited, especially under targeted attack setting. Existing methods primarily focus on vision-specific perturbations but struggle with the complex nature of vision-language modality alignment. In this work, we introduce the Dynamic Vision-Language Alignment (DynVLA) Attack, a novel approach that injects dynamic perturbations into the vision-language connector to enhance generalization across diverse vision-language alignment of different models. Our experimental results show that DynVLA significantly improves the transferability of adversarial examples across various MLLMs, including BLIP2, InstructBLIP, MiniGPT4, LLaVA, and closed-source models such as Gemini.

Problem

Research questions and friction points this paper is trying to address.

Enhancing adversarial attack transferability in MLLMs

Addressing vision-language modality alignment complexity

Improving generalization across diverse MLLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Vision-Language Alignment Attack

Enhances adversarial transferability in MLLMs

Injects dynamic perturbations into vision-language connector

🔎 Similar Papers

Cross-Modal Safety Alignment: Is textual unlearning all you need?