🤖 AI Summary
Current text-to-image editing methods are vulnerable to black-box adversarial attacks, causing catastrophic failure in text–image alignment. This paper proposes the first stealthy attack framework targeting cross-attention mechanisms: without accessing the editing model or target prompt, it perturbs text–image cross-attention weights using only a source-image-generated proxy caption to disrupt semantic alignment. We introduce two novel evaluation metrics—Caption Similarity and Semantic IoU—complemented by spatial layout analysis of segmentation masks, enabling more comprehensive assessment of semantic consistency and attack immunity. Evaluated on the TEDBench++ benchmark, our attack significantly degrades editing fidelity (average drop of 42.7%) while remaining visually imperceptible (LPIPS < 0.05). This work is the first to systematically expose the fragility of text-guided image editing systems and establishes a foundation for future robustness research.
📝 Abstract
Recent advances in text-based image editing have enabled fine-grained manipulation of visual content guided by natural language. However, such methods are susceptible to adversarial attacks. In this work, we propose a novel attack that targets the visual component of editing methods. We introduce Attention Attack, which disrupts the cross-attention between a textual prompt and the visual representation of the image by using an automatically generated caption of the source image as a proxy for the edit prompt. This breaks the alignment between the contents of the image and their textual description, without requiring knowledge of the editing method or the editing prompt. Reflecting on the reliability of existing metrics for immunization success, we propose two novel evaluation strategies: Caption Similarity, which quantifies semantic consistency between original and adversarial edits, and semantic Intersection over Union (IoU), which measures spatial layout disruption via segmentation masks. Experiments conducted on the TEDBench++ benchmark demonstrate that our attack significantly degrades editing performance while remaining imperceptible.