Anomagic: Crossmodal Prompt-driven Zero-shot Anomaly Generation

📅 2025-11-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper introduces the first zero-shot semantic anomaly generation framework to address data scarcity in anomaly detection under the realistic constraint of no authentic anomalous samples. Methodologically, it integrates cross-modal prompt encoding (vision + text) to drive a mask-based generative network; employs a contrastive refinement strategy to enhance alignment between generated anomalies and masks; and leverages multimodal large language models to automatically generate paired image-text annotations. Key contributions include: (1) constructing AnomVerse, a large-scale triplet dataset comprising 12,987 samples; (2) enabling user-defined textual prompts to synthesize semantically coherent, class-agnostic anomalous images; and (3) achieving state-of-the-art performance on downstream anomaly detection benchmarks, demonstrating strong cross-class generalization and practical deployability.

Technology Category

Application Category

📝 Abstract
We propose Anomagic, a zero-shot anomaly generation method that produces semantically coherent anomalies without requiring any exemplar anomalies. By unifying both visual and textual cues through a crossmodal prompt encoding scheme, Anomagic leverages rich contextual information to steer an inpainting-based generation pipeline. A subsequent contrastive refinement strategy enforces precise alignment between synthesized anomalies and their masks, thereby bolstering downstream anomaly detection accuracy. To facilitate training, we introduce AnomVerse, a collection of 12,987 anomaly-mask-caption triplets assembled from 13 publicly available datasets, where captions are automatically generated by multimodal large language models using structured visual prompts and template-based textual hints. Extensive experiments demonstrate that Anomagic trained on AnomVerse can synthesize more realistic and varied anomalies than prior methods, yielding superior improvements in downstream anomaly detection. Furthermore, Anomagic can generate anomalies for any normal-category image using user-defined prompts, establishing a versatile foundation model for anomaly generation.
Problem

Research questions and friction points this paper is trying to address.

Generates zero-shot anomalies without exemplar data using crossmodal prompts
Enhances anomaly detection accuracy through contrastive refinement strategy
Creates versatile anomaly foundation model for user-defined normal images
Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-shot anomaly generation without exemplar anomalies
Crossmodal prompt encoding unifies visual and textual cues
Contrastive refinement aligns synthesized anomalies with masks
🔎 Similar Papers
No similar papers found.