VII: Visual Instruction Injection for Jailbreaking Image-to-Video Generation Models

📅 2026-02-24
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a critical security vulnerability in current image-to-video (I2V) generation models, which are susceptible to visual instruction injection attacks that embed harmful intents implicitly within reference images to bypass safety mechanisms. The authors propose VII, the first training-free and transferable jailbreaking framework that leverages visual instructions in the image modality as attack vectors. By integrating malicious intent reprogramming with a visual instruction grounding module, VII enables highly effective and stealthy attacks without any modification to the target model. Coupled with semantic consistency rendering, the framework achieves up to an 83.5% attack success rate across four mainstream commercial I2V models, with near-zero refusal rates, substantially outperforming existing approaches.

Technology Category

Application Category

📝 Abstract
Image-to-Video (I2V) generation models, which condition video generation on reference images, have shown emerging visual instruction-following capability, allowing certain visual cues in reference images to act as implicit control signals for video generation. However, this capability also introduces a previously overlooked risk: adversaries may exploit visual instructions to inject malicious intent through the image modality. In this work, we uncover this risk by proposing Visual Instruction Injection (VII), a training-free and transferable jailbreaking framework that intentionally disguises the malicious intent of unsafe text prompts as benign visual instructions in the safe reference image. Specifically, VII coordinates a Malicious Intent Reprogramming module to distill malicious intent from unsafe text prompts while minimizing their static harmfulness, and a Visual Instruction Grounding module to ground the distilled intent onto a safe input image by rendering visual instructions that preserve semantic consistency with the original unsafe text prompt, thereby inducing harmful content during I2V generation. Empirically, our extensive experiments on four state-of-the-art commercial I2V models (Kling-v2.5-turbo, Gemini Veo-3.1, Seedance-1.5-pro, and PixVerse-V5) demonstrate that VII achieves Attack Success Rates of up to 83.5% while reducing Refusal Rates to near zero, significantly outperforming existing baselines.
Problem

Research questions and friction points this paper is trying to address.

Image-to-Video generation
visual instruction
jailbreaking
adversarial attack
safety risk
Innovation

Methods, ideas, or system contributions that make the work stand out.

Visual Instruction Injection
Image-to-Video Generation
Jailbreaking
Malicious Intent Reprogramming
Training-Free Attack
🔎 Similar Papers
No similar papers found.
B
Bowen Zheng
National Anti-Counterfeit Engineering Research Center, Huazhong University of Science and Technology; School of Electronic Information and Communications, Huazhong University of Science and Technology
Yongli Xiang
Yongli Xiang
University of Sydney
Trustworthy AI
Ziming Hong
Ziming Hong
The University of Sydney
Trustworthy AI
Z
Zerong Lin
National Anti-Counterfeit Engineering Research Center, Huazhong University of Science and Technology; School of Electronic Information and Communications, Huazhong University of Science and Technology
Chaojian Yu
Chaojian Yu
Huazhong University of Science and Technology (HUST)
Adversarial Machine LearningComputer Vision
Tongliang Liu
Tongliang Liu
Director, Sydney AI Centre, University of Sydney & Mohamed bin Zayed University of AI
Machine LearningLearning with Noisy LabelsTrustworthy Machine Learning
Xinge You
Xinge You
Professor of School of Electronics Information and Communications, Huazhong University of Science
Computer VisionPattern RecognitionMachine LearningWavelet Analysis and its Applications