🤖 AI Summary
Existing zero-shot anomaly segmentation models rely on fixed textual prompts, resulting in poor cross-industrial adaptability. To address this, we propose an image-aware dynamic prompt generation framework that pioneers the integration of vision annotation models with large language models (LLMs) for prompt synthesis. Given an input image, our method automatically extracts semantic attributes and generates context-aware, scene-adaptive textual prompts to guide zero-shot segmentation. Crucially, it eliminates manual prompt engineering while significantly enhancing model generalization. Evaluated on multiple industrial anomaly segmentation benchmarks, our approach achieves up to a 10% improvement in F1-max over prior methods. This advancement substantially improves robustness and practicality in dynamic, unstructured industrial environments.
📝 Abstract
Anomaly segmentation is essential for industrial quality, maintenance, and stability. Existing text-guided zero-shot anomaly segmentation models are effective but rely on fixed prompts, limiting adaptability in diverse industrial scenarios. This highlights the need for flexible, context-aware prompting strategies. We propose Image-Aware Prompt Anomaly Segmentation (IAP-AS), which enhances anomaly segmentation by generating dynamic, context-aware prompts using an image tagging model and a large language model (LLM). IAP-AS extracts object attributes from images to generate context-aware prompts, improving adaptability and generalization in dynamic and unstructured industrial environments. In our experiments, IAP-AS improves the F1-max metric by up to 10%, demonstrating superior adaptability and generalization. It provides a scalable solution for anomaly segmentation across industries