🤖 AI Summary
To address the trade-off between modeling accuracy and computational efficiency in battery arbitrage decision-making for European implicit balancing markets, this paper proposes a synergistic framework integrating model predictive control (MPC) and model-free reinforcement learning (RL). Specifically, the rolling optimization capability and price/load forecasting functionality of MPC are embedded into the RL policy network, enabling model-guided decisions while preserving RL’s fast inference advantage. Evaluated on real 2023 Belgian balancing market data, the method achieves 16.15% higher arbitrage profit than standalone RL and 54.36% higher than conventional MPC, significantly improving both economic performance and system responsiveness. The key innovation lies in the first-ever tight coupling of predictive information with model-free decision-making—overcoming the fundamental performance trade-off between data-driven and model-driven approaches.
📝 Abstract
In Europe, profit-seeking balance responsible parties can deviate in real time from their day-ahead nominations to assist transmission system operators in maintaining the supply-demand balance. Model predictive control (MPC) strategies to exploit these implicit balancing strategies capture arbitrage opportunities, but fail to accurately capture the price-formation process in the European imbalance markets and face high computational costs. Model-free reinforcement learning (RL) methods are fast to execute, but require data-intensive training and usually rely on real-time and historical data for decision-making. This paper proposes an MPC-guided RL method that combines the complementary strengths of both MPC and RL. The proposed method can effectively incorporate forecasts into the decision-making process (as in MPC), while maintaining the fast inference capability of RL. The performance of the proposed method is evaluated on the implicit balancing battery control problem using Belgian balancing data from 2023. First, we analyze the performance of the standalone state-of-the-art RL and MPC methods from various angles, to highlight their individual strengths and limitations. Next, we show an arbitrage profit benefit of the proposed MPC-guided RL method of 16.15% and 54.36%, compared to standalone RL and MPC.