Beyond Model Ranking: Predictability-Aligned Evaluation for Time Series Forecasting

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current time-series forecasting evaluation suffers from a fundamental flaw: conventional metrics conflate model performance with intrinsic data predictability, yielding biased assessments. To address this, we propose a spectral-coherence-based predictability-aligned evaluation framework. It introduces the Spectral Coherence Predictability (SCP) score and the Linear Utilization Ratio (LUR) diagnostic tool—revealing, for the first time, the phenomenon of “predictability drift.” Our method integrates fast Fourier transform (O(N log N)), frequency-resolved analysis, and linear system modeling to quantify task-inherent difficulty and assess how efficiently models exploit available information. Experiments demonstrate complementary strengths: complex models excel on low-SCP subtasks, while linear models dominate high-SCP regimes. This framework shifts evaluation from static ranking to predictability-aware dynamic diagnosis, enabling principled model selection and targeted improvement. (149 words)

Technology Category

Application Category

📝 Abstract
In the era of increasingly complex AI models for time series forecasting, progress is often measured by marginal improvements on benchmark leaderboards. However, this approach suffers from a fundamental flaw: standard evaluation metrics conflate a model's performance with the data's intrinsic unpredictability. To address this pressing challenge, we introduce a novel, predictability-aligned diagnostic framework grounded in spectral coherence. Our framework makes two primary contributions: the Spectral Coherence Predictability (SCP), a computationally efficient ($O(Nlog N)$) and task-aligned score that quantifies the inherent difficulty of a given forecasting instance, and the Linear Utilization Ratio (LUR), a frequency-resolved diagnostic tool that precisely measures how effectively a model exploits the linearly predictable information within the data. We validate our framework's effectiveness and leverage it to reveal two core insights. First, we provide the first systematic evidence of "predictability drift", demonstrating that a task's forecasting difficulty varies sharply over time. Second, our evaluation reveals a key architectural trade-off: complex models are superior for low-predictability data, whereas linear models are highly effective on more predictable tasks. We advocate for a paradigm shift, moving beyond simplistic aggregate scores toward a more insightful, predictability-aware evaluation that fosters fairer model comparisons and a deeper understanding of model behavior.
Problem

Research questions and friction points this paper is trying to address.

Evaluating forecasting models without accounting for data unpredictability bias
Proposing a predictability-aligned framework using spectral coherence diagnostics
Revealing predictability drift and architectural trade-offs in model performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spectral Coherence Predictability quantifies inherent forecasting difficulty
Linear Utilization Ratio measures model's exploitation of predictable information
Framework enables predictability-aware evaluation for fair model comparisons
W
Wanjin Feng
Tsinghua University
Y
Yuan Yuan
Tsinghua University
Jingtao Ding
Jingtao Ding
Tsinghua University
Spatio-temporal Data MiningComplex NetworksSynthetic DataRecommender Systems
Y
Yong Li
Tsinghua University