Inception: Jailbreak the Memory Mechanism of Text-to-Image Generation Systems

📅 2025-04-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work exposes a critical security vulnerability in the memory mechanisms of text-to-image generation systems (e.g., DALL·E 3), significantly amplifying the risk of multi-turn jailbreaking attacks. To exploit this flaw, we propose the first multi-turn memory-induction attack framework: it partitions malicious prompts into semantically coherent, benign-looking sub-prompts via prompt chunking and recursive semantic decomposition; further, it models dialogue state and adversarially repurposes memory retrieval to accumulate and trigger illicit content across turns. Crucially, every individual input is classified as safe by the system, yet the model consistently generates the target prohibited image. Evaluated on DALL·E 3, our method achieves a 14% higher success rate than the state-of-the-art. This is the first empirical demonstration that memory mechanisms can be systematically weaponized—establishing a novel paradigm and providing foundational evidence for generative AI security assessment.

Technology Category

Application Category

📝 Abstract
Currently, the memory mechanism has been widely and successfully exploited in online text-to-image (T2I) generation systems ($e.g.$, DALL$cdot$E 3) for alleviating the growing tokenization burden and capturing key information in multi-turn interactions. Despite its practicality, its security analyses have fallen far behind. In this paper, we reveal that this mechanism exacerbates the risk of jailbreak attacks. Different from previous attacks that fuse the unsafe target prompt into one ultimate adversarial prompt, which can be easily detected or may generate non-unsafe images due to under- or over-optimization, we propose Inception, the first multi-turn jailbreak attack against the memory mechanism in real-world text-to-image generation systems. Inception embeds the malice at the inception of the chat session turn by turn, leveraging the mechanism that T2I generation systems retrieve key information in their memory. Specifically, Inception mainly consists of two modules. It first segments the unsafe prompt into chunks, which are subsequently fed to the system in multiple turns, serving as pseudo-gradients for directive optimization. Specifically, we develop a series of segmentation policies that ensure the images generated are semantically consistent with the target prompt. Secondly, after segmentation, to overcome the challenge of the inseparability of minimum unsafe words, we propose recursion, a strategy that makes minimum unsafe words subdivisible. Collectively, segmentation and recursion ensure that all the request prompts are benign but can lead to malicious outcomes. We conduct experiments on the real-world text-to-image generation system ($i.e.$, DALL$cdot$E 3) to validate the effectiveness of Inception. The results indicate that Inception surpasses the state-of-the-art by a 14% margin in attack success rate.
Problem

Research questions and friction points this paper is trying to address.

Analyzes security risks in memory mechanisms of T2I systems
Proposes multi-turn jailbreak attack bypassing current defenses
Segments unsafe prompts into benign-looking incremental inputs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Segments unsafe prompts into benign chunks
Uses recursion to subdivide minimum unsafe words
Leverages memory mechanism for multi-turn attacks
🔎 Similar Papers
No similar papers found.
Shiqian Zhao
Shiqian Zhao
Nanyang Technological University of Singapore
RobustAIAI SecurityAutomatic Driving
Jiayang Liu
Jiayang Liu
University of Science and Technology of China
Adversarial exampleAI security
Y
Yiming Li
Nanyang Technological University
Runyi Hu
Runyi Hu
Nanyang Technological University
Large Language ModelAI AlignmentWatermarking
Xiaojun Jia
Xiaojun Jia
Nanyang Technological University
Explainable AIRobust AIEfficient AI
W
Wenshu Fan
University of Electronic Science and Technology of China
X
Xinfeng Li
Nanyang Technological University
J
Jie Zhang
A*STAR
W
Wei Dong
Nanyang Technological University
T
Tianwei Zhang
Nanyang Technological University
A
Anh Tuan Luu
Nanyang Technological University