TMUAD: Enhancing Logical Capabilities in Unified Anomaly Detection Models with a Text Memory Bank

📅 2025-08-29

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Existing unified anomaly detection methods rely on image feature extraction and memory banks, yet struggle to identify logical anomalies when normal samples are scarce. To address this, we propose TMUAD—a novel framework that introduces, for the first time, a *textual memory bank* to explicitly model semantic-logical relationships among objects. TMUAD establishes a triple-complementary memory architecture integrating class-level textual, object-level visual, and patch-level image features, enabling joint detection of structural and logical anomalies. The method synergistically combines a logic-aware textual extractor, an image segmentation module, a vision encoder, and a cross-modal retrieval mechanism, followed by multi-level anomaly score fusion to enhance discriminative capability. Evaluated on seven public industrial and medical datasets, TMUAD achieves state-of-the-art performance, particularly excelling in logical anomaly detection. The code and pretrained models are publicly released.

Technology Category

Application Category

📝 Abstract

Anomaly detection, which aims to identify anomalies deviating from normal patterns, is challenging due to the limited amount of normal data available. Unlike most existing unified methods that rely on carefully designed image feature extractors and memory banks to capture logical relationships between objects, we introduce a text memory bank to enhance the detection of logical anomalies. Specifically, we propose a Three-Memory framework for Unified structural and logical Anomaly Detection (TMUAD). First, we build a class-level text memory bank for logical anomaly detection by the proposed logic-aware text extractor, which can capture rich logical descriptions of objects from input images. Second, we construct an object-level image memory bank that preserves complete object contours by extracting features from segmented objects. Third, we employ visual encoders to extract patch-level image features for constructing a patch-level memory bank for structural anomaly detection. These three complementary memory banks are used to retrieve and compare normal images that are most similar to the query image, compute anomaly scores at multiple levels, and fuse them into a final anomaly score. By unifying structural and logical anomaly detection through collaborative memory banks, TMUAD achieves state-of-the-art performance across seven publicly available datasets involving industrial and medical domains. The model and code are available at https://github.com/SIA-IDE/TMUAD.

Problem

Research questions and friction points this paper is trying to address.

Detecting logical anomalies with limited normal data

Unifying structural and logical anomaly detection methods

Enhancing object relationship understanding through text memory

Innovation

Methods, ideas, or system contributions that make the work stand out.

Text memory bank for logical anomaly detection

Three-memory framework combining structural and logical

Multi-level anomaly score fusion from complementary banks

🔎 Similar Papers

What Information Contributes to Log-based Anomaly Detection? Insights from a Configurable Transformer-Based Approach