🤖 AI Summary
This work addresses the sequential decision-making challenge in power grid topology control, characterized by combinatorial explosion in the action space and costly simulation-based evaluation. The authors propose a reinforcement learning method that integrates a semi-Markov decision process with a physics-informed Gibbs prior. Decision-making is triggered only when the grid enters a risky state, and a graph neural network surrogate predicts post-action overload risk to dynamically construct a state-dependent candidate action set and reweight policy logits, substantially reducing exploration difficulty and online simulation overhead. Notably, this is the first approach to embed a physics-based Gibbs prior directly into the policy selection mechanism, enabling efficient and flexible topology control. Experiments across three benchmark environments demonstrate up to 255% higher cumulative reward, 284% more survival steps, approximately 2.5× faster decision speed, and up to 200× lower simulation cost compared to baseline methods.
📝 Abstract
Topology control for power grid operation is a challenging sequential decision making problem because the action space grows combinatorially with the size of the grid and action evaluation through simulation is computationally expensive. We propose a physics-informed Reinforcement Learning framework that combines semi-Markov control with a Gibbs prior, that encodes the system's physics, over the action space. The decision is only taken when the grid enters a hazardous regime, while a graph neural network surrogate predicts the post action overload risk of feasible topology actions. These predictions are used to construct a physics-informed Gibbs prior that both selects a small state-dependent candidate set and reweights policy logits before action selection. In this way, our method reduces exploration difficulty and online simulation cost while preserving the flexibility of a learned policy. We evaluate the approach in three realistic benchmark environments of increasing difficulty. Across all settings, the proposed method achieves a strong balance between control quality and computational efficiency: it matches oracle-level performance while being approximately $6\times$ faster on the first benchmark, reaches $94.6\%$ of oracle reward with roughly $200\times$ lower decision time on the second one, and on the most challenging benchmark improves over a PPO baseline by up to $255\%$ in reward and $284\%$ in survived steps while remaining about $2.5\times$ faster than a strong specialized engineering baseline. These results show that our method provides an effective mechanism for topology control in power grids.