When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models

📅 2026-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the vulnerability of vision-prompt-driven image editing models to jailbreak attacks that exploit purely visual inputs to circumvent conventional safety mechanisms. It presents the first formal characterization of the jailbreak attack surface inherent in visual prompts, introduces an introspective multimodal reasoning defense that requires no additional training, and establishes IESBench—the first benchmark for evaluating visual jailbreak security. The proposed attack method, VJA, achieves success rates of 80.9% and 70.1% on Nano Banana Pro and GPT-Image-1.5, respectively. The defense mechanism significantly enhances the safety of weakly aligned models—bringing their robustness up to the level of commercial systems—without incurring extra model complexity or computational overhead.

Technology Category

Application Category

📝 Abstract
Recent advances in large image editing models have shifted the paradigm from text-driven instructions to vision-prompt editing, where user intent is inferred directly from visual inputs such as marks, arrows, and visual-text prompts. While this paradigm greatly expands usability, it also introduces a critical and underexplored safety risk: the attack surface itself becomes visual. In this work, we propose Vision-Centric Jailbreak Attack (VJA), the first visual-to-visual jailbreak attack that conveys malicious instructions purely through visual inputs. To systematically study this emerging threat, we introduce IESBench, a safety-oriented benchmark for image editing models. Extensive experiments on IESBench demonstrate that VJA effectively compromises state-of-the-art commercial models, achieving attack success rates of up to 80.9% on Nano Banana Pro and 70.1% on GPT-Image-1.5. To mitigate this vulnerability, we propose a training-free defense based on introspective multimodal reasoning, which substantially improves the safety of poorly aligned models to a level comparable with commercial systems, without auxiliary guard models and with negligible computational overhead. Our findings expose new vulnerabilities, provide both a benchmark and practical defense to advance safe and trustworthy modern image editing systems. Warning: This paper contains offensive images created by large image editing models.
Problem

Research questions and friction points this paper is trying to address.

vision-centric jailbreak
image editing models
visual prompts
safety risk
adversarial attack
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Centric Jailbreak Attack
Visual Prompting
Image Editing Safety
IESBench
Training-Free Defense
🔎 Similar Papers
No similar papers found.
J
Jiacheng Hou
Tsinghua University, China
Yining Sun
Yining Sun
Johns Hopkins University
Computer Vision
R
Ruochong Jin
Tsinghua University, China; Peng Cheng Laboratory, Shenzhen, China
H
Haochen Han
Peng Cheng Laboratory, Shenzhen, China
Fangming Liu
Fangming Liu
Professor, School of Computer Science & Technology, Huazhong University of Science & Technology
AI & Cloud ComputingDatacenterLLM SystemEdge ComputingGreen Computing
Wai Kin Victor Chan
Wai Kin Victor Chan
Tsinghua University, Tsinghua-Berkeley Shenzhen Institute
chanw@sz.tsinghua.edu.cn
A
Alex Jinpeng Wang
Central South University, Changsha, China