TempusBench: An Evaluation Framework for Time-Series Forecasting

๐Ÿ“… 2026-04-13
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

172K/year
๐Ÿค– AI Summary
This work addresses the absence of a unified, comprehensive, and community-recognized evaluation framework for time series foundation models, as existing benchmarks often suffer from outdated data, limited task diversity, inconsistent hyperparameter tuning, and lack of visualization. To bridge this gap, the authors introduce an open-source evaluation framework that integrates novel, non-overlapping datasets; multidimensional forecasting tasks capturing statistical properties such as non-stationarity and seasonality; a standardized hyperparameter optimization protocol; and a TensorBoard-based visualization interface. This framework enables, for the first time, systematic evaluation that simultaneously ensures data recency, task diversity, tuning fairness, and result interpretability, thereby supporting fair, fine-grained, and reproducible performance comparisons between domain-specific models and foundation models, while providing the community with an extensible evaluation infrastructure.

Technology Category

Application Category

๐Ÿ“ Abstract
Foundation models have transformed natural language processing and computer vision, and a rapidly growing literature on time-series foundation models (TSFMs) seeks to replicate this success in forecasting. While recent open-source models demonstrate the promise of TSFMs, the field lacks a comprehensive and community-accepted model evaluation framework. We see at least four major issues impeding progress on the development of such a framework. First, current evaluation frameworks consist of benchmark forecasting tasks derived from often outdated datasets (e.g., M3), many of which lack clear metadata and overlap with the corpora used to pre-train TSFMs. Second, existing frameworks evaluate models along a narrowly defined set of benchmark forecasting tasks such as forecast horizon length or domain, but overlook core statistical properties such as non-stationarity and seasonality. Third, domain-specific models (e.g., XGBoost) are often compared unfairly, as existing frameworks neglect a systematic and consistent hyperparameter tuning convention for all models. Fourth, visualization tools for interpreting comparative performance are lacking. To address these issues, we introduce TempusBench, an open-source evaluation framework for TSFMs. TempusBench consists of 1) new datasets which are not included in existing TSFM pretraining corpora, 2) a set of novel benchmark tasks that go beyond existing ones, 3) a model evaluation pipeline with a standardized hyperparameter tuning protocol, and 4) a tensorboard-based visualization interface. We provide access to our code on GitHub: https://github.com/Smlcrm/TempusBench.
Problem

Research questions and friction points this paper is trying to address.

time-series forecasting
foundation models
evaluation framework
benchmarking
hyperparameter tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

time-series foundation models
evaluation framework
hyperparameter tuning
non-stationarity
benchmark tasks
๐Ÿ”Ž Similar Papers
No similar papers found.
Denizalp Goktas
Denizalp Goktas
Postdoctoral Researcher, Cornell Tech
Multiagent LearningAlgorithmic Game TheoryOptimizationEconomicsArtificial Intelligence
G
Gerardo Riaรฑo-Briceรฑo
Simulacrum, New York City, NY, USA
A
Alif Abdullah
Simulacrum, New York City, NY, USA
A
Aryan Nair
Simulacrum, New York City, NY, USA
C
Chenkai Shen
Simulacrum, New York City, NY, USA
B
Beatriz de Lucio
Simulacrum, New York City, NY, USA
A
Alexandra Magnusson
Simulacrum, New York City, NY, USA
F
Farhan Mashrur
Simulacrum, New York City, NY, USA
A
Ahmed Abdulla
Simulacrum, New York City, NY, USA
S
Shawrna Sen
Simulacrum, New York City, NY, USA
M
Mahitha Thippireddy
Simulacrum, New York City, NY, USA
G
Gregory Schwartz
Simulacrum, New York City, NY, USA
Amy Greenwald
Amy Greenwald
Professor of Computer Science, Brown University
Artificial IntelligenceMulti-agent SystemsInternet Economics