A Realistic Evaluation of Cross-Frequency Transfer Learning and Foundation Forecasting Models

📅 2025-09-23

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Existing cross-frequency transfer learning (CFTL) evaluations of foundational forecasting models (FFMs) suffer from four critical flaws: reliance on small test sets, improper sample-size accounting, reporting of suboptimal baselines, and neglect of pretraining–test data overlap risks. This paper introduces the first rigorously isolated evaluation framework—strictly separating private/synthetic pretraining data from test data—and uniformly implements state-of-the-art neural forecasting models across 15 large-scale public time series datasets to eliminate data leakage. Key contributions include: (i) correcting sample-size bias in evaluation; (ii) demonstrating that statistical models—including ensembles—outperform current FFMs by 8.2% in scaled Continuous Ranked Probability Score (sCRPS) and 20% in Mean Absolute Scaled Error (MASE); and (iii) providing the first empirical validation that synthetic-data pretraining consistently improves FFM accuracy by up to 7%. These findings reveal systematic overestimation of FFM performance and establish a new paradigm for trustworthy forecasting benchmarking.

Technology Category

Application Category

📝 Abstract

Cross-frequency transfer learning (CFTL) has emerged as a popular framework for curating large-scale time series datasets to pre-train foundation forecasting models (FFMs). Although CFTL has shown promise, current benchmarking practices fall short of accurately assessing its performance. This shortcoming stems from many factors: an over-reliance on small-scale evaluation datasets; inadequate treatment of sample size when computing summary statistics; reporting of suboptimal statistical models; and failing to account for non-negligible risks of overlap between pre-training and test datasets. To address these limitations, we introduce a unified reimplementation of widely-adopted neural forecasting networks, adapting them for the CFTL setup; we pre-train only on proprietary and synthetic data, being careful to prevent test leakage; and we evaluate on 15 large, diverse public forecast competition datasets. Our empirical analysis reveals that statistical models' accuracy is frequently underreported. Notably, we confirm that statistical models and their ensembles consistently outperform existing FFMs by more than 8.2% in sCRPS, and by more than 20% MASE, across datasets. However, we also find that synthetic dataset pre-training does improve the accuracy of a FFM by 7% percent.

Problem

Research questions and friction points this paper is trying to address.

Evaluating cross-frequency transfer learning performance accurately in forecasting

Addressing limitations in current benchmarking practices for foundation models

Assessing statistical models versus neural networks forecasting accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified reimplementation of neural forecasting networks for CFTL

Pre-training only on proprietary and synthetic data sources

Evaluation across 15 large diverse public forecast datasets

🔎 Similar Papers

FreDF: Learning to Forecast in the Frequency Domain