🤖 AI Summary
This study addresses the challenge of enabling building heating systems to efficiently respond to grid-side flexibility dispatch signals while maintaining indoor thermal comfort. The authors propose a reinforcement learning control framework that integrates the Deep Deterministic Policy Gradient (DDPG) algorithm with a building thermal dynamics model, augmented by a real-time adaptive safety filter to rigorously enforce compliance with system operator dispatch requirements. Simulation results demonstrate that the proposed method achieves up to 50% energy savings compared to rule-based controllers and outperforms pure reinforcement learning approaches. It reliably executes demand response instructions with 100% adherence while incurring only minor violations of thermal comfort constraints, thereby significantly enhancing both energy efficiency and dispatch reliability.
📝 Abstract
Buildings account for approximately 40% of global energy consumption, and with the growing share of intermittent renewable energy sources, enabling demand-side flexibility, particularly in heating, ventilation and air conditioning systems, is essential for grid stability and energy efficiency. This paper presents a safe deep reinforcement learning-based control framework to optimize building space heating while enabling demand-side flexibility provision for power system operators. A deep deterministic policy gradient algorithm is used as the core deep reinforcement learning method, enabling the controller to learn an optimal heating strategy through interaction with the building thermal model while maintaining occupant comfort, minimizing energy cost, and providing flexibility. To address safety concerns with reinforcement learning, particularly regarding compliance with flexibility requests, we propose a real-time adaptive safety-filter to ensure that the system operates within predefined constraints during demand-side flexibility provision. The proposed real-time adaptive safety filter guarantees full compliance with flexibility requests from system operators and improves energy and cost efficiency -- achieving up to 50% savings compared to a rule-based controller -- while outperforming a standalone deep reinforcement learning-based controller in energy and cost metrics, with only a slight increase in comfort temperature violations.