🤖 AI Summary
This study exposes a critical vulnerability of Large Vision-Language Models (LVLMs) to self-generated textual interference in multimodal understanding—termed “font attacks”—where semantically misleading text embedded in images misleads the language module, inducing erroneous visual reasoning and classification. We propose two novel attack paradigms: (1) inter-class confusion attacks, leveraging category-level semantic similarity to identify effective perturbations; and (2) reasoning-driven attacks, utilizing multi-step prompt engineering to exploit the LVLM’s own generative capabilities for crafting highly transferable adversarial examples. Evaluated on state-of-the-art LVLMs—including GPT-4V, InstructBLIP, and MiniGPT-4—the attacks reduce image classification accuracy by up to 60% and demonstrate strong cross-model transferability. These findings reveal a fundamental security blind spot in the end-to-end reasoning pipeline of LVLMs, where tight coupling between vision and language modules amplifies susceptibility to textual adversarial manipulation.
📝 Abstract
Typographic attacks, adding misleading text to images, can deceive vision-language models (LVLMs). The susceptibility of recent large LVLMs like GPT4-V to such attacks is understudied, raising concerns about amplified misinformation in personal assistant applications. Previous attacks use simple strategies, such as random misleading words, which don't fully exploit LVLMs' language reasoning abilities. We introduce an experimental setup for testing typographic attacks on LVLMs and propose two novel self-generated attacks: (1) Class-based attacks, where the model identifies a similar class to deceive itself, and (2) Reasoned attacks, where an advanced LVLM suggests an attack combining a deceiving class and description. Our experiments show these attacks significantly reduce classification performance by up to 60% and are effective across different models, including InstructBLIP and MiniGPT4. Code: https://github.com/mqraitem/Self-Gen-Typo-Attack