Fidel-TS: A High-Fidelity Benchmark for Multimodal Time Series Forecasting

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current time-series forecasting evaluation is compromised by low-quality benchmarks, which suffer from pretraining data contamination, causal leakage, and cross-modal descriptive information leakage—leading to spurious performance gains. Method: We propose a “high-fidelity benchmark” paradigm, formally defining three foundational principles: data-source reliability, causal rigor, and modality-structure clarity. Based on real-time API collection, we construct Fidel-TS—a large-scale, multimodal time-series benchmark—and introduce causal isolation mechanisms and strict cross-modal alignment strategies. Contribution/Results: Empirical analysis reveals substantial evaluation bias in existing benchmarks. In contrast, Fidel-TS effectively exposes models’ generalization failures under realistic conditions, establishing the first causally credible evaluation standard for multimodal time-series forecasting.

Technology Category

Application Category

📝 Abstract
The evaluation of time series forecasting models is hindered by a critical lack of high-quality benchmarks, leading to a potential illusion of progress. Existing datasets suffer from issues ranging from pre-training data contamination in the age of LLMs to the causal and description leakage prevalent in early multimodal designs. To address this, we formalize the core principles of high-fidelity benchmarking, focusing on data sourcing integrity, strict causal soundness, and structural clarity. We introduce Fidel-TS, a new large-scale benchmark built from the ground up on these principles by sourcing data from live APIs. Our extensive experiments validate this approach by exposing the critical biases and design limitations of prior benchmarks. Furthermore, we conclusively demonstrate that the causal relevance of textual information is the key factor in unlocking genuine performance gains in multimodal forecasting.
Problem

Research questions and friction points this paper is trying to address.

Addressing the lack of high-quality time series forecasting benchmarks
Resolving data contamination and causal leakage in multimodal datasets
Establishing causal relevance of textual data for forecasting gains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Built benchmark using live API data sourcing
Enforced causal soundness and structural clarity
Demonstrated textual causality drives multimodal gains
🔎 Similar Papers
No similar papers found.
Zhijian Xu
Zhijian Xu
University of Science and Technology of China
Natural Language Processing
W
Wanxu Cai
School of Software, Tsinghua University
X
Xilin Dai
ZJU-UIUC Institute, Zhejiang University
Z
Zhaorong Deng
School of Data Science, The Chinese University of Hong Kong, Shenzhen
Q
Qiang Xu
Department of Computer Science and Engineering, The Chinese University of Hong Kong