🤖 AI Summary
This work addresses the weak cross-model transferability of adversarial examples and the lack of standardized safety evaluation for multimodal large language models (MLLMs). We conduct the first systematic empirical assessment of adversarial transferability across mainstream MLLMs, uncovering pervasive cross-architectural vulnerabilities. To enhance transferability, we propose Typography-Aware Transferable Adversarial Method (TATM), which integrates typography-informed semantic perturbation, cross-modal feature disentanglement and editing, and joint vision-language optimization to improve both information diversity and generalizability of adversarial samples. Evaluated on two realistic safety-critical scenarios—“harmful word insertion” and “critical information protection”—TATM significantly boosts cross-model attack success rates, achieving an average 32.7% improvement in transfer performance. This work establishes a new paradigm, provides novel tooling, and introduces an empirical benchmark for advancing adversarial robustness research in MLLMs.
📝 Abstract
Recently, Multimodal Large Language Models (MLLMs) achieve remarkable performance in numerous zero-shot tasks due to their outstanding cross-modal interaction and comprehension abilities. However, MLLMs are found to still be vulnerable to human-imperceptible adversarial examples. In the exploration of security vulnerabilities in real-world scenarios, transferability, which can achieve cross-model impact, is considered the greatest threat posed by adversarial examples. However, there is currently no systematic research on the threat of cross-MLLMs adversarial transferability. Therefore, this paper as the first step to provide a comprehensive evaluation of the transferability of adversarial examples generated by various MLLMs. Furthermore, leveraging two key factors that influence transferability performance: 1) The strength of information diversity involved in the adversarial generation process; 2) Editing across vision-language modality information. We propose a boosting method called Typography Augment Transferability Method (TATM) to investigate the adversarial transferability performance across MLLMs further. Through extensive experimental validation, our TATM demonstrates exceptional performance in real-world applications of"Harmful Word Insertion"and"Important Information Protection".