AgentIAD: Tool-Augmented Single-Agent for Industrial Anomaly Detection

📅 2025-12-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Industrial Anomaly Detection (IAD) faces dual challenges: scarcity of normal training samples and difficulty in identifying subtle, localized defects. To address these, we propose the first tool-augmented single-agent framework that synergistically integrates a Perceptive Zoomer—enabling adaptive multi-scale visual attention—and a Comparative Retriever—performing exemplar-based defect localization via reference matching. Our method executes multi-stage, fine-grained visual inspection through structured agent trajectories. We introduce a novel perception-action dual-reward reinforcement learning paradigm, complemented by spatially aligned reconstruction loss and trajectory-aware structural modeling. Evaluated on the MMAD benchmark, our approach achieves 97.62% classification accuracy, establishing new state-of-the-art performance. Moreover, it generates interpretable, step-by-step detection trajectories with explicit tool usage and spatial reasoning—significantly outperforming existing multimodal large language models (MLLMs) in both accuracy and explainability.

Technology Category

Application Category

📝 Abstract
Industrial anomaly detection (IAD) is difficult due to the scarcity of normal reference samples and the subtle, localized nature of many defects. Single-pass vision-language models (VLMs) often overlook small abnormalities and lack explicit mechanisms to compare against canonical normal patterns. We propose AgentIAD, a tool-driven agentic framework that enables multi-stage visual inspection. The agent is equipped with a Perceptive Zoomer (PZ) for localized fine-grained analysis and a Comparative Retriever (CR) for querying normal exemplars when evidence is ambiguous. To teach these inspection behaviors, we construct structured perceptive and comparative trajectories from the MMAD dataset and train the model in two stages: supervised fine-tuning followed by reinforcement learning. A two-part reward design drives this process: a perception reward that supervises classification accuracy, spatial alignment, and type correctness, and a behavior reward that encourages efficient tool use. Together, these components enable the model to refine its judgment through step-wise observation, zooming, and verification. AgentIAD achieves a new state-of-the-art 97.62% classification accuracy on MMAD, surpassing prior MLLM-based approaches while producing transparent and interpretable inspection traces.
Problem

Research questions and friction points this paper is trying to address.

Detects subtle industrial defects with limited normal samples
Addresses single-pass models' oversight of small abnormalities
Enables multi-stage visual inspection via tool-augmented agents
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tool-driven agentic framework for multi-stage visual inspection
Perceptive Zoomer and Comparative Retriever for fine-grained analysis
Two-stage training with supervised fine-tuning and reinforcement learning
🔎 Similar Papers
No similar papers found.
J
Junwen Miao
AIR, Tsinghua University
Penghui Du
Penghui Du
Southern University of Science and Technology, Undergraduate
NeuroscienceMachine LearningfMRI imaging
Y
Yi Liu
Beihang University
Y
Yu Wang
Tsinghua University
Y
Yan Wang
AIR, Tsinghua University