Time-IMM: A Dataset and Benchmark for Irregular Multimodal Multivariate Time Series

📅 2025-06-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Real-world time series often exhibit irregular sampling, multimodal asynchrony, and high missingness rates—conditions inadequately captured by existing benchmarks that assume regular, unimodal, and complete data, thus widening the gap between theory and practice. To bridge this gap, we introduce Time-IMM, the first causal-driven, multimodal multivariate time series dataset explicitly designed for irregularity, covering nine distinct real-world irregularity mechanisms. We further release IMM-TSF, an open-source benchmark library supporting asynchronous fusion and scenario-aware evaluation. Our work is the first to systematically model three fundamental irregularity causes: trigger-based, constraint-based, and artifact-based. Methodologically, we propose timestamp textualization encoding, a multimodal asynchronous fusion module, and a recent-aware mean-attention ensemble architecture. Experiments demonstrate that explicit modeling of irregular multimodal structure substantially improves prediction robustness, yielding an average 12.7% reduction in MAE across diverse real-world scenarios.

Technology Category

Application Category

📝 Abstract
Time series data in real-world applications such as healthcare, climate modeling, and finance are often irregular, multimodal, and messy, with varying sampling rates, asynchronous modalities, and pervasive missingness. However, existing benchmarks typically assume clean, regularly sampled, unimodal data, creating a significant gap between research and real-world deployment. We introduce Time-IMM, a dataset specifically designed to capture cause-driven irregularity in multimodal multivariate time series. Time-IMM represents nine distinct types of time series irregularity, categorized into trigger-based, constraint-based, and artifact-based mechanisms. Complementing the dataset, we introduce IMM-TSF, a benchmark library for forecasting on irregular multimodal time series, enabling asynchronous integration and realistic evaluation. IMM-TSF includes specialized fusion modules, including a timestamp-to-text fusion module and a multimodality fusion module, which support both recency-aware averaging and attention-based integration strategies. Empirical results demonstrate that explicitly modeling multimodality on irregular time series data leads to substantial gains in forecasting performance. Time-IMM and IMM-TSF provide a foundation for advancing time series analysis under real-world conditions. The dataset is publicly available at https://www.kaggle.com/datasets/blacksnail789521/time-imm/data, and the benchmark library can be accessed at https://anonymous.4open.science/r/IMMTSF_NeurIPS2025.
Problem

Research questions and friction points this paper is trying to address.

Addresses irregular multimodal multivariate time series data
Introduces Time-IMM dataset for real-world irregularity scenarios
Proposes IMM-TSF benchmark for forecasting on messy time series
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dataset Time-IMM captures cause-driven irregularity
Benchmark IMM-TSF enables asynchronous multimodal fusion
Specialized modules support recency and attention strategies
🔎 Similar Papers
No similar papers found.