LogicAD: Explainable Anomaly Detection via VLM-based Text Feature Extraction

📅 2025-01-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of detecting logical consistency anomalies in industrial quality inspection, this paper proposes the first explainable detection framework based on an autoregressive vision-language model (AVLM). Methodologically, the approach leverages an AVLM to extract semantic textual features from images, integrates format-aware embeddings, and couples them with a symbolic logic reasoner—thereby jointly modeling visual understanding and formal logical inference to enable both anomaly classification and natural-language explanations of root causes. The core contribution is the first application of an autoregressive multimodal VLM to logical anomaly detection, eliminating reliance on large-scale annotated datasets, handcrafted rules, or high-performance computing resources. On the MVTec LOCO AD benchmark, the method achieves an AUROC of 86.0% and an F1-max of 83.7%, substantially outperforming existing state-of-the-art methods while generating verifiable, human-readable textual explanations—achieving both high accuracy and strong interpretability.

Technology Category

Application Category

📝 Abstract
Logical image understanding involves interpreting and reasoning about the relationships and consistency within an image's visual content. This capability is essential in applications such as industrial inspection, where logical anomaly detection is critical for maintaining high-quality standards and minimizing costly recalls. Previous research in anomaly detection (AD) has relied on prior knowledge for designing algorithms, which often requires extensive manual annotations, significant computing power, and large amounts of data for training. Autoregressive, multimodal Vision Language Models (AVLMs) offer a promising alternative due to their exceptional performance in visual reasoning across various domains. Despite this, their application to logical AD remains unexplored. In this work, we investigate using AVLMs for logical AD and demonstrate that they are well-suited to the task. Combining AVLMs with format embedding and a logic reasoner, we achieve SOTA performance on public benchmarks, MVTec LOCO AD, with an AUROC of 86.0% and F1-max of 83.7%, along with explanations of anomalies. This significantly outperforms the existing SOTA method by a large margin.
Problem

Research questions and friction points this paper is trying to address.

Automated Defect Detection
Logical Relationships Analysis
Quality Control
Innovation

Methods, ideas, or system contributions that make the work stand out.

AVLMs
Anomaly Detection
Explainability
🔎 Similar Papers
No similar papers found.