Image-based Prompt Injection: Hijacking Multimodal LLMs through Visually Embedded Adversarial Instructions

📅 2026-03-03

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work proposes Image Prompt Injection (IPI), a novel attack method that exploits the vulnerability of multimodal large language models (MLLMs) to adversarial instructions embedded in images, enabling malicious manipulation of model behavior. IPI establishes the first end-to-end, image-level prompt injection framework under black-box settings, leveraging segmentation-based region selection, adaptive font scaling, and background-aware rendering to ensure injected prompts remain imperceptible to humans while remaining interpretable by the model. Experimental evaluation on the COCO dataset with GPT-4 Turbo demonstrates that IPI achieves a 64% attack success rate under strict visual stealth constraints, highlighting its effectiveness and potential threat in real-world scenarios.

Technology Category

Application Category

📝 Abstract

Multimodal Large Language Models (MLLMs) integrate vision and text to power applications, but this integration introduces new vulnerabilities. We study Image-based Prompt Injection (IPI), a black-box attack in which adversarial instructions are embedded into natural images to override model behavior. Our end-to-end IPI pipeline incorporates segmentation-based region selection, adaptive font scaling, and background-aware rendering to conceal prompts from human perception while preserving model interpretability. Using the COCO dataset and GPT-4-turbo, we evaluate 12 adversarial prompt strategies and multiple embedding configurations. The results show that IPI can reliably manipulate the output of the model, with the most effective configuration achieving up to 64\% attack success under stealth constraints. These findings highlight IPI as a practical threat in black-box settings and underscore the need for defenses against multimodal prompt injection.

Problem

Research questions and friction points this paper is trying to address.

Multimodal Large Language Models

Prompt Injection

Adversarial Attacks

Image-based Attack

Security Vulnerability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Image-based Prompt Injection

Multimodal LLMs

Adversarial Instructions