Typography Leads Semantic Diversifying: Amplifying Adversarial Transferability across Multimodal Large Language Models

📅 2024-05-30

🏛️ arXiv.org

📈 Citations: 10

✨ Influential: 2

career value

194K/year

🤖 AI Summary

This work addresses the weak cross-model transferability of adversarial examples and the lack of standardized safety evaluation for multimodal large language models (MLLMs). We conduct the first systematic empirical assessment of adversarial transferability across mainstream MLLMs, uncovering pervasive cross-architectural vulnerabilities. To enhance transferability, we propose Typography-Aware Transferable Adversarial Method (TATM), which integrates typography-informed semantic perturbation, cross-modal feature disentanglement and editing, and joint vision-language optimization to improve both information diversity and generalizability of adversarial samples. Evaluated on two realistic safety-critical scenarios—“harmful word insertion” and “critical information protection”—TATM significantly boosts cross-model attack success rates, achieving an average 32.7% improvement in transfer performance. This work establishes a new paradigm, provides novel tooling, and introduces an empirical benchmark for advancing adversarial robustness research in MLLMs.

Technology Category

Application Category

📝 Abstract

Recently, Multimodal Large Language Models (MLLMs) achieve remarkable performance in numerous zero-shot tasks due to their outstanding cross-modal interaction and comprehension abilities. However, MLLMs are found to still be vulnerable to human-imperceptible adversarial examples. In the exploration of security vulnerabilities in real-world scenarios, transferability, which can achieve cross-model impact, is considered the greatest threat posed by adversarial examples. However, there is currently no systematic research on the threat of cross-MLLMs adversarial transferability. Therefore, this paper as the first step to provide a comprehensive evaluation of the transferability of adversarial examples generated by various MLLMs. Furthermore, leveraging two key factors that influence transferability performance: 1) The strength of information diversity involved in the adversarial generation process; 2) Editing across vision-language modality information. We propose a boosting method called Typography Augment Transferability Method (TATM) to investigate the adversarial transferability performance across MLLMs further. Through extensive experimental validation, our TATM demonstrates exceptional performance in real-world applications of"Harmful Word Insertion"and"Important Information Protection".

Problem

Research questions and friction points this paper is trying to address.

Analyzing adversarial transferability across Multimodal Large Language Models

Identifying key factors influencing adversarial example transferability

Boosting transferability using semantic-level data augmentation methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzing adversarial transferability in MLLMs

Proposing AIP and TATM augmentation methods

Testing harmful and beneficial societal impacts

🔎 Similar Papers

Cross-Modal Safety Alignment: Is textual unlearning all you need?