Fidel-TS: A High-Fidelity Benchmark for Multimodal Time Series Forecasting

📅 2025-09-29

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Current time-series forecasting evaluation is compromised by low-quality benchmarks, which suffer from pretraining data contamination, causal leakage, and cross-modal descriptive information leakage—leading to spurious performance gains. Method: We propose a “high-fidelity benchmark” paradigm, formally defining three foundational principles: data-source reliability, causal rigor, and modality-structure clarity. Based on real-time API collection, we construct Fidel-TS—a large-scale, multimodal time-series benchmark—and introduce causal isolation mechanisms and strict cross-modal alignment strategies. Contribution/Results: Empirical analysis reveals substantial evaluation bias in existing benchmarks. In contrast, Fidel-TS effectively exposes models’ generalization failures under realistic conditions, establishing the first causally credible evaluation standard for multimodal time-series forecasting.

Technology Category

Application Category

📝 Abstract

The evaluation of time series forecasting models is hindered by a critical lack of high-quality benchmarks, leading to a potential illusion of progress. Existing datasets suffer from issues ranging from pre-training data contamination in the age of LLMs to the causal and description leakage prevalent in early multimodal designs. To address this, we formalize the core principles of high-fidelity benchmarking, focusing on data sourcing integrity, strict causal soundness, and structural clarity. We introduce Fidel-TS, a new large-scale benchmark built from the ground up on these principles by sourcing data from live APIs. Our extensive experiments validate this approach by exposing the critical biases and design limitations of prior benchmarks. Furthermore, we conclusively demonstrate that the causal relevance of textual information is the key factor in unlocking genuine performance gains in multimodal forecasting.

Problem

Research questions and friction points this paper is trying to address.

Addressing the lack of high-quality time series forecasting benchmarks

Resolving data contamination and causal leakage in multimodal datasets

Establishing causal relevance of textual data for forecasting gains

Innovation

Methods, ideas, or system contributions that make the work stand out.

Built benchmark using live API data sourcing

Enforced causal soundness and structural clarity

Demonstrated textual causality drives multimodal gains

🔎 Similar Papers

Time-MMD: Multi-Domain Multimodal Dataset for Time Series Analysis