AnoStyler: Text-Driven Localized Anomaly Generation via Lightweight Style Transfer

📅 2025-11-10

📈 Citations: 0

✨ Influential: 0

career value

148K/year

🤖 AI Summary

To address the challenges of scarce anomalous images in real-world scenarios, low visual fidelity of existing generation methods, heavy reliance on large-scale annotated data, and model bloat, this paper proposes a lightweight, text-driven local anomaly generation framework. The method reformulates zero-shot anomaly generation as a text-guided style transfer task. Key innovations include a class-agnostic automatic mask generation module and a dual-class (normal/anomalous) text prompt alignment mechanism, enabling semantically controllable and spatially precise local anomaly synthesis. A CLIP-perceptual-loss-supervised lightweight U-Net architecture is adopted to significantly reduce memory footprint and computational overhead. Extensive experiments on MVTec-AD and VisA demonstrate that our approach achieves superior performance over state-of-the-art methods in terms of generation quality, diversity, and downstream anomaly detection accuracy.

Technology Category

Application Category

📝 Abstract

Anomaly generation has been widely explored to address the scarcity of anomaly images in real-world data. However, existing methods typically suffer from at least one of the following limitations, hindering their practical deployment: (1) lack of visual realism in generated anomalies; (2) dependence on large amounts of real images; and (3) use of memory-intensive, heavyweight model architectures. To overcome these limitations, we propose AnoStyler, a lightweight yet effective method that frames zero-shot anomaly generation as text-guided style transfer. Given a single normal image along with its category label and expected defect type, an anomaly mask indicating the localized anomaly regions and two-class text prompts representing the normal and anomaly states are generated using generalizable category-agnostic procedures. A lightweight U-Net model trained with CLIP-based loss functions is used to stylize the normal image into a visually realistic anomaly image, where anomalies are localized by the anomaly mask and semantically aligned with the text prompts. Extensive experiments on the MVTec-AD and VisA datasets show that AnoStyler outperforms existing anomaly generation methods in generating high-quality and diverse anomaly images. Furthermore, using these generated anomalies helps enhance anomaly detection performance.

Problem

Research questions and friction points this paper is trying to address.

Generates realistic localized anomalies using lightweight style transfer

Overcomes limitations of unrealistic visuals and heavy model dependencies

Enables zero-shot anomaly generation with text prompts and minimal data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight U-Net model for style transfer

Text-guided anomaly generation using CLIP

Category-agnostic anomaly mask generation

🔎 Similar Papers

DiffStyler: Diffusion-based Localized Image Style Transfer