MMR-AD: A Large-Scale Multimodal Dataset for Benchmarking General Anomaly Detection with Multimodal Large Language Models

📅 2026-04-13

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This work addresses the limited generalization of existing generic anomaly detection methods to unseen categories and the absence of benchmark datasets tailored for multimodal large language models (MLLMs). To bridge this gap, we introduce MMR-AD, the first large-scale multimodal image-text dataset specifically designed for generic anomaly detection, along with Anomaly-R1, a reasoning-enhanced baseline model that integrates chain-of-thought reasoning and reinforcement learning. Experimental results demonstrate that Anomaly-R1 significantly outperforms current generic MLLM-based approaches in both anomaly detection and localization on MMR-AD, offering improved alignment with real-world industrial requirements.

Technology Category

Application Category

📝 Abstract

In the progress of industrial anomaly detection, general anomaly detection (GAD) is an emerging trend and also the ultimate goal. Unlike the conventional single- and multi-class AD, general AD aims to train a general AD model that can directly detect anomalies in diverse novel classes without any retraining or fine-tuning on the target data. Recently, Multimodal Large Language Models (MLLMs) have shown great promise in achieving general anomaly detection due to their revolutionary visual understanding and language reasoning capabilities. However, MLLM's general AD ability remains underexplored due to: (1) MLLMs are pretrained on amounts of data sourced from the Web, these data still have significant gaps with the data in AD scenarios. Moreover, the image-text pairs during pretraining are also not specifically for AD tasks. (2) The current mainstream AD datasets are image-based and not yet suitable for post-training MLLMs. To facilitate MLLM-based general AD research, we present MMR-AD, which is a comprehensive benchmark for both training and evaluating MLLM-based AD models. With MMR-AD, we reveal that the AD performance of current SOTA generalist MLLMs still falls far behind the industrial requirements. Based on MMR-AD, we also propose a baseline model, Anomaly-R1, which is a reasoning-based AD model that learns from the CoT data in MMR-AD and is further enhanced by reinforcement learning. Extensive experiments show that our Anomaly-R1 achieves remarkable improvements over generalist MLLMs in both anomaly detection and localization.

Problem

Research questions and friction points this paper is trying to address.

General Anomaly Detection

Multimodal Large Language Models

Benchmark Dataset

Industrial Anomaly Detection

Multimodal Learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal Large Language Models

General Anomaly Detection

MMR-AD Dataset