Generate "Normal", Edit Poisoned: Branding Injection via Hint Embedding in Image Editing

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This work uncovers a novel stealthy attack mechanism wherein adversaries embed imperceptible brand cues—such as logos—into input images to covertly induce multi-turn image generation or editing models to automatically reproduce the embedded content on semantically related objects, thereby achieving implicit content injection. The attack is demonstrated to be effective in both phishing-style and data-poisoning scenarios, achieving average success rates of 44.4% and 32.2%, respectively. The authors introduce and validate the first prompt-embedding-based method for covert brand injection, integrating diffusion models, image editing, and invisible watermarking techniques. Furthermore, they propose an efficient defense strategy that mitigates the two attack variants with success rates of 87.4% and 92.3%, respectively.

📝 Abstract

With the rapid advancement of generative AI, users increasingly rely on image-generation models for image design and creation. To achieve faithful outputs, users typically engage in multi-turn interactions during image refinement: a text-to-image generation phase followed by a text-guided image-to-image editing phase. In this paper, we investigate a novel security vulnerability associated with such a workflow. Our key insight is that a nearly invisible hint, like branding information (e.g., a logo), embedded in an input image can be recognized by downstream generative models and subsequently re-rendered onto semantically related objects, even when the user prompt does not explicitly mention it. This form of hidden payload injection makes the attack stealthy. We study two realistic attack scenarios. The first is a phishing-based setting, in which an attacker controls an online image generation service and injects hidden content into generated images before they are returned to users. The second is a poison-based setting, where an attacker distributes a compromised text-to-image diffusion model whose output contains hidden content. We evaluate both attacks using six injected payloads, including well-known logos and customized designs, and demonstrate that the two attacks can achieve success rates of 44.4% and 32.2% on average, respectively, while ensuring the injected logos are visually imperceptible. We also develop a mitigation solution that achieves an average success rate of 87.4% and 92.3% against the phishing-based and poison-based attacks, respectively.

Problem

Research questions and friction points this paper is trying to address.

image editing

branding injection

hint embedding

generative AI

security vulnerability

Innovation

Methods, ideas, or system contributions that make the work stand out.

hint embedding

branding injection

image editing attack