Unified Prompt Attack Against Text-to-Image Generation Models

📅 2025-02-23

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

This work addresses the robustness and security risks of text-to-image (T2I) generative models under unified adversarial attacks. We propose UPAM—the first text-visual joint unified prompt attack framework. Methodologically, UPAM innovatively integrates three components: (i) Sphere-Probing Learning for gradient-free optimization, (ii) Semantic-Enhancing Learning for cross-modal semantic alignment, and (iii) In-context Naturalness Enhancement for context-driven generation of human-indistinguishable prompts. Evaluated across diverse state-of-the-art T2I models, UPAM achieves high attack success rates with significantly fewer API queries than prior methods, while preserving strong semantic fidelity and prompt naturalness. Experimental results demonstrate that UPAM consistently outperforms existing attacks in both effectiveness and efficiency, establishing a new paradigm and a rigorous benchmark for security evaluation of T2I systems.

Technology Category

Application Category

📝 Abstract

Text-to-Image (T2I) models have advanced significantly, but their growing popularity raises security concerns due to their potential to generate harmful images. To address these issues, we propose UPAM, a novel framework to evaluate the robustness of T2I models from an attack perspective. Unlike prior methods that focus solely on textual defenses, UPAM unifies the attack on both textual and visual defenses. Additionally, it enables gradient-based optimization, overcoming reliance on enumeration for improved efficiency and effectiveness. To handle cases where T2I models block image outputs due to defenses, we introduce Sphere-Probing Learning (SPL) to enable optimization even without image results. Following SPL, our model bypasses defenses, inducing the generation of harmful content. To ensure semantic alignment with attacker intent, we propose Semantic-Enhancing Learning (SEL) for precise semantic control. UPAM also prioritizes the naturalness of adversarial prompts using In-context Naturalness Enhancement (INE), making them harder for human examiners to detect. Additionally, we address the issue of iterative queries--common in prior methods and easily detectable by API defenders--by introducing Transferable Attack Learning (TAL), allowing effective attacks with minimal queries. Extensive experiments validate UPAM's superiority in effectiveness, efficiency, naturalness, and low query detection rates.

Problem

Research questions and friction points this paper is trying to address.

Evaluates T2I model robustness via unified attack.

Enables optimization without image outputs using SPL.

Reduces query detection with Transferable Attack Learning.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified attack on textual and visual defenses

Gradient-based optimization for efficiency

Semantic control enhancing attack precision

🔎 Similar Papers

No similar papers found.