Counter-Dyna: Data-Efficient RL-Based HVAC Control using Counterfactual Building Models

📅 2026-05-06
📈 Citations: 0
Influential: 0
📄 PDF

career value

194K/year
📝 Abstract
Model-based reinforcement learning (MBRL) offers a promising approach for data-efficient energy management in buildings, combining the strengths of predictive modeling and reinforcement learning. While previous MBRL methods applied to HVAC control have reduced training data requirements, they still require several months of interaction with the building to learn a satisfactory control policy. A key reason is that existing surrogate models attempt to predict the entire state-space, including weather and electricity prices that are unaffected by control actions, or completely ignore these variables. Addressing these issues, we propose Counter-Dyna, a method that enhances the data-efficiency of Dyna, an MBRL method. We create data-efficient counterfactual surrogate models (CSM) by leveraging invariances in the state-space. Using a CSM in Dyna speeds up RL training measured in environment interaction data compared to previous results. In comparison with previous state-of-the-art that used 6-12 months of environment interactions, our method needs only 5 weeks. We evaluate our method in a large simulation study using the literature standard BOPTEST framework and proximal policy algorithm (PPO) as the RL algorithm. Our results show cost-saving potentials of 5.3% to 17.0% in a hypothetical deployment scenario. Our work is a significant step towards making real-world deployment of RL algorithms in HVAC control practically viable.
Problem

Research questions and friction points this paper is trying to address.

data-efficient reinforcement learning
HVAC control
model-based reinforcement learning
building energy management
surrogate models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Counterfactual Surrogate Model
Model-based Reinforcement Learning
Data Efficiency
HVAC Control
Dyna