🤖 AI Summary
Industrial anomaly detection urgently requires synthetic defect samples that are both photorealistic and spatially precise; however, existing diffusion models struggle to simultaneously preserve global realism and local region fidelity. To address this, we propose a region-constrained diffusion mechanism: background features are frozen while only anomaly regions are updated during denoising, coupled with a pixel-level discriminative mask-guided adversarial training framework to enable fine-grained spatial control and high-fidelity anomaly synthesis. Our core innovation lies in embedding region mask priors directly into the diffusion process and jointly optimizing the generator with a mask-aware discriminator. Evaluated on MVTec-AD and BTAD, our method achieves state-of-the-art performance, notably improving pixel-level anomaly segmentation accuracy (AUC ↑3.2%) and visual quality.
📝 Abstract
Synthesizing realistic and spatially precise anomalies is essential for enhancing the robustness of industrial anomaly detection systems. While recent diffusion-based methods have demonstrated strong capabilities in modeling complex defect patterns, they often struggle with spatial controllability and fail to maintain fine-grained regional fidelity. To overcome these limitations, we propose SARD (Segmentation-Aware anomaly synthesis via Region-constrained Diffusion with discriminative mask Guidance), a novel diffusion-based framework specifically designed for anomaly generation. Our approach introduces a Region-Constrained Diffusion (RCD) process that preserves the background by freezing it and selectively updating only the foreground anomaly regions during the reverse denoising phase, thereby effectively reducing background artifacts. Additionally, we incorporate a Discriminative Mask Guidance (DMG) module into the discriminator, enabling joint evaluation of both global realism and local anomaly fidelity, guided by pixel-level masks. Extensive experiments on the MVTec-AD and BTAD datasets show that SARD surpasses existing methods in segmentation accuracy and visual quality, setting a new state-of-the-art for pixel-level anomaly synthesis.