EmoAttack: Emotion-to-Image Diffusion Models for Emotional Backdoor Generation

📅 2024-06-22
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF

career value

195K/year
🤖 AI Summary
This work identifies a novel affective backdoor attack—EmoAttack—against text-to-image diffusion models: adversaries exploit emotion-laden tokens (e.g., “angry”, “fearful”) in input prompts as stealthy triggers to implicitly inject malicious negative visual content, thereby eliciting adverse user affect. To enable efficient and covert implantation, we formulate backdoor construction as a diffusion personalization problem and propose EmoBooth—a lightweight LoRA-based fine-tuning method that jointly leverages semantic emotion clustering and cross-modal alignment to map emotion word clusters to malicious reference images. Evaluated on a newly curated affective backdoor dataset, EmoAttack achieves >92% trigger success rate while preserving high image fidelity and strong emotion–content consistency, with zero interference on benign prompts—demonstrating exceptional stealth. This is the first systematic identification and empirical validation of emotion-driven backdoor vulnerabilities in diffusion models, establishing a new paradigm for multimodal AI security research.

Technology Category

Application Category

📝 Abstract
Text-to-image diffusion models can create realistic images based on input texts. Users can describe an object to convey their opinions visually. In this work, we unveil a previously unrecognized and latent risk of using diffusion models to generate images; we utilize emotion in the input texts to introduce negative contents, potentially eliciting unfavorable emotions in users. Emotions play a crucial role in expressing personal opinions in our daily interactions, and the inclusion of maliciously negative content can lead users astray, exacerbating negative emotions. Specifically, we identify the emotion-aware backdoor attack (EmoAttack) that can incorporate malicious negative content triggered by emotional texts during image generation. We formulate such an attack as a diffusion personalization problem to avoid extensive model retraining and propose the EmoBooth. Unlike existing personalization methods, our approach fine-tunes a pre-trained diffusion model by establishing a mapping between a cluster of emotional words and a given reference image containing malicious negative content. To validate the effectiveness of our method, we built a dataset and conducted extensive analysis and discussion about its effectiveness. Given consumers' widespread use of diffusion models, uncovering this threat is critical for society.
Problem

Research questions and friction points this paper is trying to address.

Investigates emotion-triggered backdoor attacks in text-to-image diffusion models
Proposes EmoAttack to inject malicious content via emotional text inputs
Highlights societal risks of emotion-based manipulation in AI-generated images
Innovation

Methods, ideas, or system contributions that make the work stand out.

Emotion-triggered backdoor attack in diffusion models
Fine-tuning with emotional word clusters mapping
EmoBooth for diffusion personalization without retraining
🔎 Similar Papers