Probabilistic measures afford fair comparisons of AIWP and NWP model output

📅 2025-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Fairly comparing deterministic forecasts from AI-based weather prediction (AIWP) and numerical weather prediction (NWP) models is hindered by conventional loss-function–based evaluation, which introduces bias due to model-specific output characteristics. Method: We propose the Potential Continuous Ranked Probability Score (PC), a unified, unit-consistent, and unbiased metric. PC leverages isotropic distribution regression (IDR) to perform statistically equivalent post-processing on outputs from both AIWP and NWP models, eliminating dependence on arbitrary loss functions. Contribution/Results: PC is non-negative, scale-invariant, achieves zero only for perfect forecasts, and admits physical interpretation. It enables, for the first time, strictly fair cross-paradigm comparison without pre-specified loss functions. On WeatherBench 2, GraphCast achieves significantly lower PC than ECMWF’s HRES; HRES’s PC aligns closely with its ensemble CRPS. PC establishes an interpretable, reproducible benchmark for evaluating diverse weather forecasting paradigms.

Technology Category

Application Category

📝 Abstract
We introduce a new measure for fair and meaningful comparisons of single-valued output from artificial intelligence based weather prediction (AIWP) and numerical weather prediction (NWP) models, called potential continuous ranked probability score (PC). In a nutshell, we subject the deterministic backbone of physics-based and data-driven models post hoc to the same statistical postprocessing technique, namely, isotonic distributional regression (IDR). Then we find PC as the mean continuous ranked probability score (CRPS) of the postprocessed probabilistic forecasts. The nonnegative PC measure quantifies potential predictive performance and is invariant under strictly increasing transformations of the model output. PC attains its most desirable value of zero if, and only if, the weather outcome Y is a fixed, non-decreasing function of the model output X. The PC measure is recorded in the unit of the outcome, has an upper bound of one half times the mean absolute difference between outcomes, and serves as a proxy for the mean CRPS of real-time, operational probabilistic products. When applied to WeatherBench 2 data, our approach demonstrates that the data-driven GraphCast model outperforms the leading, physics-based European Centre for Medium Range Weather Forecasts (ECMWF) high-resolution (HRES) model. Furthermore, the PC measure for the HRES model aligns exceptionally well with the mean CRPS of the operational ECMWF ensemble. Across application domains, our approach affords comparisons of single-valued forecasts in settings where the pre-specification of a loss function -- which is the usual, and principally superior, procedure in forecast contests, administrative, and benchmarks settings -- places competitors on unequal footings.
Problem

Research questions and friction points this paper is trying to address.

Develops a measure for comparing AI and numerical weather models
Evaluates predictive performance using probabilistic postprocessing techniques
Compares GraphCast and ECMWF models using WeatherBench 2 data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces potential continuous ranked probability score (PC)
Uses isotonic distributional regression (IDR) technique
Compares AIWP and NWP models fairly
🔎 Similar Papers
No similar papers found.
T
T. Gneiting
Computational Statistics group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany; Institute of Statistics, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
Tobias Biegert
Tobias Biegert
PhD Student, Karlsruhe Institute of Technology
Probabilistic ForecastingMachine LearningWeather Forecasting
K
Kristof Kraus
Institute for Stochastics, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
E
Eva-Maria Walz
Computational Statistics group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany; Institute of Statistics, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
Alexander I. Jordan
Alexander I. Jordan
Heidelberg Institute for Theoretical Studies (HITS)
Forecasting
Sebastian Lerch
Sebastian Lerch
University of Marburg
Statistics and ProbabilityForecastingMachine Learning