Image-based Prompt Injection: Hijacking Multimodal LLMs through Visually Embedded Adversarial Instructions

📅 2026-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes Image Prompt Injection (IPI), a novel attack method that exploits the vulnerability of multimodal large language models (MLLMs) to adversarial instructions embedded in images, enabling malicious manipulation of model behavior. IPI establishes the first end-to-end, image-level prompt injection framework under black-box settings, leveraging segmentation-based region selection, adaptive font scaling, and background-aware rendering to ensure injected prompts remain imperceptible to humans while remaining interpretable by the model. Experimental evaluation on the COCO dataset with GPT-4 Turbo demonstrates that IPI achieves a 64% attack success rate under strict visual stealth constraints, highlighting its effectiveness and potential threat in real-world scenarios.

Technology Category

Application Category

📝 Abstract
Multimodal Large Language Models (MLLMs) integrate vision and text to power applications, but this integration introduces new vulnerabilities. We study Image-based Prompt Injection (IPI), a black-box attack in which adversarial instructions are embedded into natural images to override model behavior. Our end-to-end IPI pipeline incorporates segmentation-based region selection, adaptive font scaling, and background-aware rendering to conceal prompts from human perception while preserving model interpretability. Using the COCO dataset and GPT-4-turbo, we evaluate 12 adversarial prompt strategies and multiple embedding configurations. The results show that IPI can reliably manipulate the output of the model, with the most effective configuration achieving up to 64\% attack success under stealth constraints. These findings highlight IPI as a practical threat in black-box settings and underscore the need for defenses against multimodal prompt injection.
Problem

Research questions and friction points this paper is trying to address.

Multimodal Large Language Models
Prompt Injection
Adversarial Attacks
Image-based Attack
Security Vulnerability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Image-based Prompt Injection
Multimodal LLMs
Adversarial Instructions
Black-box Attack
Stealthy Embedding
🔎 Similar Papers
No similar papers found.
N
Neha Nagaraja
School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, USA
Lan Zhang
Lan Zhang
Assistant Professor, Northern Arizona University
CybersecurityMachine Learning
Z
Zhilong Wang
Bytedance, CA, USA
Bo Zhang
Bo Zhang
Meta
DatabaseVerifiable ComputationMachine Learning
P
Pawan Patil
Bytedance, CA, USA