🤖 AI Summary
This study addresses the challenge of balancing revenue maximization and risk management in wholesale electricity markets with high renewable penetration, where bidding strategies must navigate the complexities of both day-ahead and real-time markets. To this end, the authors develop a high-fidelity two-stage bidding simulation environment grounded in empirical PJM market data and propose MARS-DA, a hierarchical multi-agent reinforcement learning framework. MARS-DA features a meta-controller that dynamically coordinates a “safe agent” and a “speculative agent” to enable risk-aware bidding decisions. The work introduces the first open-source, standardized reinforcement learning benchmark tailored to two-settlement electricity markets. Experimental results demonstrate that the proposed approach significantly improves risk-adjusted returns under extreme price volatility, outperforming existing methods while exhibiting robust adaptability to evolving market mechanisms.
📝 Abstract
The increasing penetration of renewable energy has introduced substantial volatility into wholesale electricity markets, complicating the optimal bidding strategies for power producers. Traditional Reinforcement Learning (RL) approaches often struggle to balance profit maximization with risk management, frequently overfitting to specific market conditions or failing to account for the stochastic spread between Day-Ahead (DA) and Real-Time (RT) settlements. To address these challenges, this paper makes two primary contributions. First, we introduce and open-source a high-fidelity gymnasium environment for two-settlement electricity market bidding. Grounded in extensive empirical data from the PJM Interconnection, the environment explicitly models the interplay between DA commitments and RT deviations, providing a standardized testbed for general and risk-sensitive agents. Second, we propose MARS-DA (Multi-Agent Regime-Switching for Day-Ahead markets), a novel hierarchical framework that orchestrates distinct sub-policies for risk management and profit seeking. MARS-DA utilizes a top-level Meta-Controller to dynamically blend the actions of two specialized base agents: a "Safe Agent" that optimizes for reliable DA allocation and a "Speculator Agent" that targets volatile RT arbitrage opportunities. Extensive experiments demonstrate that MARS-DA achieves superior risk-adjusted returns compared to state-of-the-art baselines while maintaining robust regime alignment during periods of extreme market volatility.