π€ AI Summary
In industrial visual anomaly detection, the scarcity of real anomalous samples and the structural inconsistency and feature entanglement inherent in existing synthetic methods pose critical bottlenecks. To address these challenges, we propose a dual-helix diffusion framework that jointly generates high-fidelity anomalous images and corresponding pixel-level masks. Our method employs domain-decoupled attention to disentangle normal and abnormal features, and introduces a semantic score map alignment module to ensure structural coherence and perceptual realism of generated anomalies. Furthermore, it supports fine-grained control via text and sketch guidance. Evaluated on benchmarks including MVTec-AD, our synthesized data achieves superior diversity and realism compared to state-of-the-art approaches. Downstream anomaly detection models trained on our synthetic data yield an average AUC improvement of 4.2%, demonstrating the frameworkβs effectiveness in alleviating annotation dependency and enhancing model generalization.
π Abstract
Visual anomaly inspection is critical in manufacturing, yet hampered by the scarcity of real anomaly samples for training robust detectors. Synthetic data generation presents a viable strategy for data augmentation; however, current methods remain constrained by two principal limitations: 1) the generation of anomalies that are structurally inconsistent with the normal background, and 2) the presence of undesirable feature entanglement between synthesized images and their corresponding annotation masks, which undermines the perceptual realism of the output. This paper introduces Double Helix Diffusion (DH-Diff), a novel cross-domain generative framework designed to simultaneously synthesize high-fidelity anomaly images and their pixel-level annotation masks, explicitly addressing these challenges. DH-Diff employs a unique architecture inspired by a double helix, cycling through distinct modules for feature separation, connection, and merging. Specifically, a domain-decoupled attention mechanism mitigates feature entanglement by enhancing image and annotation features independently, and meanwhile a semantic score map alignment module ensures structural authenticity by coherently integrating anomaly foregrounds. DH-Diff offers flexible control via text prompts and optional graphical guidance. Extensive experiments demonstrate that DH-Diff significantly outperforms state-of-the-art methods in diversity and authenticity, leading to significant improvements in downstream anomaly detection performance.