🤖 AI Summary
To address the challenges of scarce anomalous images in real-world scenarios, low visual fidelity of existing generation methods, heavy reliance on large-scale annotated data, and model bloat, this paper proposes a lightweight, text-driven local anomaly generation framework. The method reformulates zero-shot anomaly generation as a text-guided style transfer task. Key innovations include a class-agnostic automatic mask generation module and a dual-class (normal/anomalous) text prompt alignment mechanism, enabling semantically controllable and spatially precise local anomaly synthesis. A CLIP-perceptual-loss-supervised lightweight U-Net architecture is adopted to significantly reduce memory footprint and computational overhead. Extensive experiments on MVTec-AD and VisA demonstrate that our approach achieves superior performance over state-of-the-art methods in terms of generation quality, diversity, and downstream anomaly detection accuracy.
📝 Abstract
Anomaly generation has been widely explored to address the scarcity of anomaly images in real-world data. However, existing methods typically suffer from at least one of the following limitations, hindering their practical deployment: (1) lack of visual realism in generated anomalies; (2) dependence on large amounts of real images; and (3) use of memory-intensive, heavyweight model architectures. To overcome these limitations, we propose AnoStyler, a lightweight yet effective method that frames zero-shot anomaly generation as text-guided style transfer. Given a single normal image along with its category label and expected defect type, an anomaly mask indicating the localized anomaly regions and two-class text prompts representing the normal and anomaly states are generated using generalizable category-agnostic procedures. A lightweight U-Net model trained with CLIP-based loss functions is used to stylize the normal image into a visually realistic anomaly image, where anomalies are localized by the anomaly mask and semantically aligned with the text prompts. Extensive experiments on the MVTec-AD and VisA datasets show that AnoStyler outperforms existing anomaly generation methods in generating high-quality and diverse anomaly images. Furthermore, using these generated anomalies helps enhance anomaly detection performance.