Mind the Trojan Horse: Image Prompt Adapter Enabling Scalable and Deceptive Jailbreaking

๐Ÿ“… 2025-04-08
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work identifies and systematically validates a novel โ€œhijacking jailbreak attackโ€ induced by integrating IP-Adapter into text-to-image diffusion models (T2I-DMs): attackers can upload imperceptible image-space adversarial examples to trigger widespread, unintentional jailbreaking by benign users, severely compromising the reliability and trustworthiness of image generation services (IGS). We propose a hijacking attack paradigm tailored to T2I-IP-DMs, substantially lowering the barrier for adversarial example construction. By analyzing the IP-Adapter architecture, leveraging open-source image encoders, and applying image-space adversarial perturbations, our method achieves over 92% attack success rates across multiple mainstream models. To counter this threat, we design a defense framework incorporating adversarial training, which effectively mitigates both false negatives and false positives inherent in existing detection approaches. This study reveals a critical vulnerability in multimodal alignment modules and provides both an actionable attack benchmark and a robust defensive strategy for secure T2I model deployment.

Technology Category

Application Category

๐Ÿ“ Abstract
Recently, the Image Prompt Adapter (IP-Adapter) has been increasingly integrated into text-to-image diffusion models (T2I-DMs) to improve controllability. However, in this paper, we reveal that T2I-DMs equipped with the IP-Adapter (T2I-IP-DMs) enable a new jailbreak attack named the hijacking attack. We demonstrate that, by uploading imperceptible image-space adversarial examples (AEs), the adversary can hijack massive benign users to jailbreak an Image Generation Service (IGS) driven by T2I-IP-DMs and mislead the public to discredit the service provider. Worse still, the IP-Adapter's dependency on open-source image encoders reduces the knowledge required to craft AEs. Extensive experiments verify the technical feasibility of the hijacking attack. In light of the revealed threat, we investigate several existing defenses and explore combining the IP-Adapter with adversarially trained models to overcome existing defenses' limitations. Our code is available at https://github.com/fhdnskfbeuv/attackIPA.
Problem

Research questions and friction points this paper is trying to address.

IP-Adapter enables hijacking attacks on T2I-DMs
Adversarial examples can jailbreak Image Generation Services
Open-source image encoders lower attack knowledge barrier
Innovation

Methods, ideas, or system contributions that make the work stand out.

IP-Adapter enables scalable jailbreaking attacks
Adversarial examples hijack benign users
Combines IP-Adapter with adversarially trained defenses
๐Ÿ”Ž Similar Papers
No similar papers found.
Junxi Chen
Junxi Chen
SUN YAT-SEN UNIVERSITY
Deep learning
J
Junhao Dong
Nanyang Technological University, Singapore
X
Xiaohua Xie
School of Computer Science and Engineering, Sun Yat-Sen University, China; Guangdong Province Key Laboratory of Information Security Technology, China