🤖 AI Summary
This work addresses the scarcity of real defective samples in industrial visual anomaly detection by proposing the first fully automatic, high-fidelity anomaly generation pipeline that requires neither real anomalous images nor model training. Leveraging a general-purpose generative model (Gemini 2.5 Flash) and a vision-language model, the method integrates automated prompt generation, CLIP-based quality filtering, and a lightweight dual-branch semantic change detection module that fuses features from Grounding DINO and YOLOv8-Seg to synthesize photorealistic anomalous images along with pixel-level masks. Experiments on MVTec AD and VisA demonstrate that the generated data achieves strong performance in downstream segmentation tasks and visual fidelity. The authors also release a large-scale public dataset comprising over 13,000 image–mask pairs.
📝 Abstract
Industrial visual anomaly detection (VAD) methods are typically trained on normal samples only, yet performance improves substantially when even limited anomalous data is available. Existing anomaly generation approaches either require real anomalous examples, demand expensive hardware, or produce synthetic defects that lack realism. We present MIRAGE (Model-agnostic Industrial Realistic Anomaly Generation and Evaluation), a fully automated pipeline for realistic anomalous image generation and pixel-level mask creation that requires no training and no anomalous images. Our pipeline accesses any generative model as a black box via API calls, uses a VLM for automatic defect prompt generation, and includes a CLIP-based quality filter to retain only well-aligned generated images. For mask generation at scale, we introduce a lightweight, training-free dual-branch semantic change detection module combining text-conditioned Grounding DINO features with fine-grained YOLOv26-Seg structural features. We benchmark four generation methods using Gemini 2.5 Flash Image (Nano Banana) as the generative backbone, evaluating performance on MVTec AD and VisA across two distinct tasks: (i) downstream anomaly segmentation and (ii) visual quality of the generated images, assessed via standard metrics (IS, IC-LPIPS) and a human perceptual study involving 31 participants and 1,550 pairwise votes. The results demonstrate that MIRAGE offers a scalable, accessible foundation for anomaly-aware industrial inspection that requires no real defect data. As a final contribution, we publicly release a large-scale dataset comprising 500 image-mask pairs per category for every MVTec AD and VisA class, over 13,000 pairs in total, alongside all generation prompts and pipeline code.