Towards Multimodal Time Series Anomaly Detection with Semantic Alignment and Condensed Interaction

📅 2026-03-23

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This work addresses the limitation of existing time series anomaly detection methods, which predominantly focus on numerical signals and overlook the semantic complementarity of multimodal information such as text, thereby struggling to achieve cross-modal semantic alignment and efficient interaction. To this end, we propose MindTS, a novel model that introduces, for the first time, fine-grained temporal-textual semantic alignment and a content condensation–reconstruction mechanism. By integrating cross-view text fusion, multimodal alignment, and cross-modal reconstruction, MindTS effectively resolves semantic inconsistency and redundancy in heterogeneous data. Extensive experiments on six real-world multimodal datasets demonstrate that MindTS significantly outperforms current state-of-the-art methods, achieving leading or highly competitive performance in anomaly detection.

Technology Category

Application Category

📝 Abstract

Time series anomaly detection plays a critical role in many dynamic systems. Despite its importance, previous approaches have primarily relied on unimodal numerical data, overlooking the importance of complementary information from other modalities. In this paper, we propose a novel multimodal time series anomaly detection model (MindTS) that focuses on addressing two key challenges: (1) how to achieve semantically consistent alignment across heterogeneous multimodal data, and (2) how to filter out redundant modality information to enhance cross-modal interaction effectively. To address the first challenge, we propose Fine-grained Time-text Semantic Alignment. It integrates exogenous and endogenous text information through cross-view text fusion and a multimodal alignment mechanism, achieving semantically consistent alignment between time and text modalities. For the second challenge, we introduce Content Condenser Reconstruction, which filters redundant information within the aligned text modality and performs cross-modal reconstruction to enable interaction. Extensive experiments on six real-world multimodal datasets demonstrate that the proposed MindTS achieves competitive or superior results compared to existing methods. The code is available at: https://github.com/decisionintelligence/MindTS.

Problem

Research questions and friction points this paper is trying to address.

multimodal

time series anomaly detection

semantic alignment

cross-modal interaction

redundant information

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal Time Series

Semantic Alignment

Content Condenser