CAHS-Attack: CLIP-Aware Heuristic Search Attack Method for Stable Diffusion

📅 2025-11-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion models exhibit significant vulnerability to adversarial prompts, yet existing white-box gradient-based attacks or manual prompt engineering approaches suffer from limited accessibility and poor generalizability. This paper proposes the first fully black-box, gradient-free adversarial prompt generation framework for Stable Diffusion. Our method integrates a constrained genetic algorithm—used to pre-select semantically similar initial prompts—with Monte Carlo Tree Search (MCTS), innovatively leveraging CLIP text embeddings as a semantic-guided heuristic to preserve maximally semantics-disruptive paths during MCTS rollouts. We further enhance attack robustness via suffix fine-tuning. Experiments demonstrate state-of-the-art success rates across diverse prompt lengths and semantic categories. Crucially, our work systematically identifies the CLIP text encoder as the fundamental source of security vulnerability in text-to-image diffusion models—a finding previously unreported.

Technology Category

Application Category

📝 Abstract
Diffusion models exhibit notable fragility when faced with adversarial prompts, and strengthening attack capabilities is crucial for uncovering such vulnerabilities and building more robust generative systems. Existing works often rely on white-box access to model gradients or hand-crafted prompt engineering, which is infeasible in real-world deployments due to restricted access or poor attack effect. In this paper, we propose CAHS-Attack , a CLIP-Aware Heuristic Search attack method. CAHS-Attack integrates Monte Carlo Tree Search (MCTS) to perform fine-grained suffix optimization, leveraging a constrained genetic algorithm to preselect high-potential adversarial prompts as root nodes, and retaining the most semantically disruptive outcome at each simulation rollout for efficient local search. Extensive experiments demonstrate that our method achieves state-of-the-art attack performance across both short and long prompts of varying semantics. Furthermore, we find that the fragility of SD models can be attributed to the inherent vulnerability of their CLIP-based text encoders, suggesting a fundamental security risk in current text-to-image pipelines.
Problem

Research questions and friction points this paper is trying to address.

Addressing adversarial prompt vulnerabilities in Stable Diffusion models
Overcoming limitations of white-box access and manual prompt engineering
Enhancing attack capabilities to uncover generative system weaknesses
Innovation

Methods, ideas, or system contributions that make the work stand out.

Monte Carlo Tree Search for fine-grained suffix optimization
Constrained genetic algorithm preselects adversarial prompts
Retains semantically disruptive outcomes for efficient local search
🔎 Similar Papers
No similar papers found.
Shuhan Xia
Shuhan Xia
北京邮电大学
人工智能 多模态
J
Jing Dai
China Mobile Ltd, Department of Cyber and Information Security Management, Beijing, China
H
Hui Ouyang
Aspire Information Technology (Beijing) Company Limited, Beijing, China
Y
Yadong Shang
China Mobile Ltd, Department of Cyber and Information Security Management, Beijing, China
D
Dongxiao Zhao
Aspire Information Technology (Beijing) Company Limited, Beijing, China
Peipei Li
Peipei Li
Beijing University of Posts and Telecommunications (BUPT)
Computer VisionImage SynthesisFace Recognition