EpiCastBench: Datasets and Benchmarks for Multivariate Epidemic Forecasting

๐Ÿ“… 2026-05-12
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

181K/year
๐Ÿค– AI Summary
This study addresses the critical scarcity of high-quality, diverse multivariate benchmark datasets in epidemic forecasting research. To this end, it presents the first systematic integration of 40 multivariate epidemiological datasets spanning multiple infectious diseases and geographic regions. The authors establish a unified preprocessing pipeline and a standardized evaluation protocol, enabling a comprehensive and reproducible assessment of 15 state-of-the-art modelsโ€”including statistical methods, deep learning architectures, and foundation models. Rigorous evaluation is ensured through multiple performance metrics and statistical significance tests. All data and code have been publicly released, providing an authoritative benchmark to advance data-driven public health decision-making in epidemic prediction.
๐Ÿ“ Abstract
The increasing adoption of data-driven decision-making in public health has established epidemic forecasting as a critical area of research. Recent advances in multivariate forecasting models better capture complex temporal dependencies than conventional univariate approaches, which model individual series independently. Despite this potential, the development of robust epidemic forecasting methods is constrained by the lack of high-quality benchmarks comprising diverse multivariate datasets across infectious diseases and geographical regions. To address this gap, we present EpiCastBench, a large-scale benchmarking framework featuring 40 curated (correlated) multivariate epidemic datasets. These publicly available datasets span a wide range of infectious diseases and exhibit diverse characteristics in terms of temporal granularity, series length, and sparsity. We analyze these datasets to identify their global features and structural patterns. To ensure reproducibility and fair comparison, we establish standardized evaluation settings, including a unified forecasting horizon, consistent preprocessing pipelines, diverse performance metrics, and statistical significance testing. By leveraging this framework, we conduct a comprehensive evaluation of 15 multivariate forecasting models spanning statistical baselines to state-of-the-art deep learning and foundation models. All datasets and code are publicly available on Kaggle (https://www.kaggle.com/datasets/aimltsf/epicastbench) and GitHub (https://github.com/aimltsf/EpiCastBench).
Problem

Research questions and friction points this paper is trying to address.

epidemic forecasting
multivariate time series
benchmarking
public health
data-driven decision-making
Innovation

Methods, ideas, or system contributions that make the work stand out.

multivariate epidemic forecasting
benchmarking framework
EpiCastBench
temporal dependencies
reproducible evaluation
๐Ÿ”Ž Similar Papers
No similar papers found.