π€ AI Summary
This study addresses the limitations of existing quantile-based approaches for evaluating battery arbitrage, which fail to accurately capture the economic value of probabilistic forecasting models and neglect temporal dependencies in electricity prices as well as incentive compatibility. To overcome these issues, the authors propose a stochastic programming framework that leverages full predictive distributions, jointly optimizing day-ahead price forecasts with energy storage decisions. This enables a systematic assessment of how forecast quality influences decision performance under varying risk preferences. Empirical analysis using German electricity market data demonstrates that conventional quantile-based strategies can mislead model ranking, whereas the proposed full-probability approach provides a more reliable measure of a forecasting modelβs economic value, thereby establishing a new paradigm for application-oriented forecast evaluation.
π Abstract
Electricity price forecasting supports decision-making in energy markets and asset operation. Probabilistic forecasts are increasingly adopted to explicitly quantify uncertainty, typically issued as quantile predictions or ensembles of the full predictive distribution. However, how improvements in statistical forecast quality translate into economic value remains unclear. Battery storage arbitrage in day-ahead markets is a popular application-based benchmark for this purpose. We analyze quantile-based trading strategies (QBTS) and identify two critical flaws: they do not incentivize honest probabilistic forecasting and they ignore the intertemporal dependence structure of electricity prices. We therefore frame battery optimization as a stochastic program based on fully probabilistic forecasts and examine decision quality measurement for risk-neutral and risk-averse settings under different uncertainty models. Our discussion touches both sides of the coin: How reliable is the economic evaluation of forecasting models though (simplified) application studies - and how do improvements in statistical forecast quality for stochastic programs relate to the decision-quality and economic performance? We provide theoretical justification and empirical evidence from a case study on the German electricity market. Our results highlight the pitfalls of ranking forecasting models through battery trading strategies. We conclude with implications for evaluation practice and directions for future research in application-based forecast assessment.