RobustA: Robust Anomaly Detection in Multimodal Data

📅 2025-11-10

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Real-world multimodal data often suffer from environmental interference, leading to corrupted audio-visual modalities that severely degrade anomaly detection performance. This work presents the first systematic investigation into the impact of modality corruption on multimodal anomaly detection and introduces the first benchmark dataset specifically designed for evaluating robustness under such corruptions. We propose a robust detection framework comprising three key components: (1) cross-modal shared representation learning to enforce semantic alignment across modalities; (2) a corruption-level-aware dynamic weighting fusion mechanism that adaptively adjusts modality contributions; and (3) a corruption-aware reasoning module to enhance feature discriminability. Extensive experiments across diverse corruption scenarios demonstrate significant improvements over state-of-the-art methods, achieving substantially enhanced robustness and practicality. To foster reproducibility and further research, we will publicly release the source code, benchmark dataset, and pre-extracted features.

Technology Category

Application Category

📝 Abstract

In recent years, multimodal anomaly detection methods have demonstrated remarkable performance improvements over video-only models. However, real-world multimodal data is often corrupted due to unforeseen environmental distortions. In this paper, we present the first-of-its-kind work that comprehensively investigates the adverse effects of corrupted modalities on multimodal anomaly detection task. To streamline this work, we propose RobustA, a carefully curated evaluation dataset to systematically observe the impacts of audio and visual corruptions on the overall effectiveness of anomaly detection systems. Furthermore, we propose a multimodal anomaly detection method, which shows notable resilience against corrupted modalities. The proposed method learns a shared representation space for different modalities and employs a dynamic weighting scheme during inference based on the estimated level of corruption. Our work represents a significant step forward in enabling the real-world application of multimodal anomaly detection, addressing situations where the likely events of modality corruptions occur. The proposed evaluation dataset with corrupted modalities and respective extracted features will be made publicly available.

Problem

Research questions and friction points this paper is trying to address.

Investigating adverse effects of corrupted modalities on multimodal anomaly detection

Addressing resilience against audio and visual corruptions in real-world data

Developing robust anomaly detection methods for corrupted multimodal inputs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learns shared representation space for modalities

Employs dynamic weighting scheme during inference

Estimates corruption level to enhance resilience

🔎 Similar Papers

No similar papers found.