๐ค AI Summary
This work addresses the limitations of existing zero- and few-shot anomaly detection methods, which rely heavily on auxiliary data and solely on visual-text embedding similarity, thereby lacking the capacity to reason about complex anomalies such as logical or contextual inconsistencies. To overcome this, the paper introduces AnomalyAgentโthe first training-free agent framework for zero-shot anomaly detection, which uniquely integrates agent-based reasoning into this domain. Built upon a multimodal large language model, AnomalyAgent incorporates an anomaly-oriented toolset, a customized memory module, and tailored zero- or few-shot prompting strategies to enable context-aware, adaptive reasoning. Experimental results demonstrate that AnomalyAgent significantly outperforms current training-free vision-language models and general-purpose agent approaches across diverse anomaly types, exhibiting exceptional generalization capability.
๐ Abstract
Benefiting from generalizability of vision-language models (VLMs) such as CLIP, many zero-/few-shot anomaly detection (AD) approaches have achieved impressive detection performance across various datasets. Nevertheless, they require substantial training on large auxiliary datasets to adapt VLMs to anomaly detection, and their inference largely relies on visual-text embedding similarity-based anomaly scores, lacking reasoning abilities to detect complex anomalies that require in-depth contextual understanding. To address this limitation, we propose \textbf{AnomalyAgent}, a novel training-free, agentic framework that leverages the advanced reasoning and generalization capabilities of multimodal large language models (MLLMs) for anomaly detection. The key ingredients include \textbf{1)} a comprehensive anomaly-centric toolset that enables adaptive MLLM-driven, agentic anomaly reasoning in zero-shot settings, and \textbf{2)} a customized memory module that grounds anomaly reasoning with few-shot, in-context reference examples. We extend evaluation beyond the detection of simple anomalies (e.g., surface defects like cracks and dents and clear lesions) in widely used benchmarks to more diverse types of anomalies such as logical/contextual anomalies in logistics and manufacturing settings. Extensive experiment results demonstrate that our AnomalyAgent achieves substantially better performance compared to training-free VLM-based AD and generic agentic methods, highlighting its superior generalization capability in both zero-shot and few-shot anomaly detection settings. The code implementation can be find at this address.