🤖 AI Summary
This paper addresses the limitation of conventional weather forecast evaluation—its overreliance on statistical accuracy while neglecting decision utility—by proposing a novel value-oriented evaluation paradigm grounded in the decision-maker’s perspective. Methodologically, it introduces a “decision calibration” framework that integrates decision theory, probabilistic calibration analysis, and multi-task utility assessment to systematically compare machine learning and numerical weather prediction models under realistic decision-making scenarios. The key contribution is the empirical revelation of a substantial mismatch between statistical performance and decision utility: the same forecast model exhibits markedly divergent rankings across distinct decision tasks (e.g., disaster mitigation vs. energy dispatch), rendering traditional metrics inadequate for application-specific model selection. The framework establishes an interpretable, task-adapted foundation for quantifying forecast service value and guiding operational model choice.
📝 Abstract
Standard weather forecast evaluations focus on the forecaster's perspective and on a statistical assessment comparing forecasts and observations. In practice, however, forecasts are used to make decisions, so it seems natural to take the decision-maker's perspective and quantify the value of a forecast by its ability to improve decision-making. Decision calibration provides a novel framework for evaluating forecast performance at the decision level rather than the forecast level. We evaluate decision calibration to compare Machine Learning and classical numerical weather prediction models on various weather-dependent decision tasks. We find that model performance at the forecast level does not reliably translate to performance in downstream decision-making: some performance differences only become apparent at the decision level, and model rankings can change among different decision tasks. Our results confirm that typical forecast evaluations are insufficient for selecting the optimal forecast model for a specific decision task.