Training-Free Anomaly Generation via Dual-Attention Enhancement in Diffusion Model

📅 2025-08-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Industrial anomaly detection is long hindered by the scarcity of authentic anomaly samples. To address this, we propose AAG—a training-free anomaly generation framework built upon Stable Diffusion. AAG localizes anomaly regions via visual masking and controls semantic attributes through text prompts. It introduces a dual-attention enhancement mechanism: Cross-Attention Enhancement (CAE) ensures textual fidelity of generated anomalies, while Self-Attention Enhancement (SAE) preserves local texture consistency between anomalies and the original image. Crucially, AAG synthesizes high-fidelity, spatially controllable anomalies without requiring additional training data or fine-tuning. Extensive experiments on MVTec AD and VisA demonstrate that AAG significantly boosts the performance of diverse downstream anomaly detection models. Our approach establishes an efficient, general-purpose synthetic paradigm for industrial anomaly detection under data-scarce conditions.

Technology Category

Application Category

📝 Abstract
Industrial anomaly detection (AD) plays a significant role in manufacturing where a long-standing challenge is data scarcity. A growing body of works have emerged to address insufficient anomaly data via anomaly generation. However, these anomaly generation methods suffer from lack of fidelity or need to be trained with extra data. To this end, we propose a training-free anomaly generation framework dubbed AAG, which is based on Stable Diffusion (SD)'s strong generation ability for effective anomaly image generation. Given a normal image, mask and a simple text prompt, AAG can generate realistic and natural anomalies in the specific regions and simultaneously keep contents in other regions unchanged. In particular, we propose Cross-Attention Enhancement (CAE) to re-engineer the cross-attention mechanism within Stable Diffusion based on the given mask. CAE increases the similarity between visual tokens in specific regions and text embeddings, which guides these generated visual tokens in accordance with the text description. Besides, generated anomalies need to be more natural and plausible with object in given image. We propose Self-Attention Enhancement (SAE) which improves similarity between each normal visual token and anomaly visual tokens. SAE ensures that generated anomalies are coherent with original pattern. Extensive experiments on MVTec AD and VisA datasets demonstrate effectiveness of AAG in anomaly generation and its utility. Furthermore, anomaly images generated by AAG can bolster performance of various downstream anomaly inspection tasks.
Problem

Research questions and friction points this paper is trying to address.

Addresses data scarcity in industrial anomaly detection
Generates realistic anomalies without extra training
Enhances cross-attention and self-attention for coherent anomalies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free anomaly generation via Stable Diffusion
Cross-Attention Enhancement for region-specific anomalies
Self-Attention Enhancement for natural anomaly coherence
🔎 Similar Papers
No similar papers found.
Zuo Zuo
Zuo Zuo
Xi'an Jiao Tong University
J
Jiahao Dong
Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ)
Yanyun Qu
Yanyun Qu
Xiamen University
Computer Vision
Z
Zongze Wu
College of Mechatronics and Control Engineering, Shenzhen University