Counter-Dyna: Data-Efficient RL-Based HVAC Control using Counterfactual Building Models

📅 2026-05-06

📈 Citations: 0

✨ Influential: 0

career value

180K/year

📝 Abstract

Model-based reinforcement learning (MBRL) offers a promising approach for data-efficient energy management in buildings, combining the strengths of predictive modeling and reinforcement learning. While previous MBRL methods applied to HVAC control have reduced training data requirements, they still require several months of interaction with the building to learn a satisfactory control policy. A key reason is that existing surrogate models attempt to predict the entire state-space, including weather and electricity prices that are unaffected by control actions, or completely ignore these variables. Addressing these issues, we propose Counter-Dyna, a method that enhances the data-efficiency of Dyna, an MBRL method. We create data-efficient counterfactual surrogate models (CSM) by leveraging invariances in the state-space. Using a CSM in Dyna speeds up RL training measured in environment interaction data compared to previous results. In comparison with previous state-of-the-art that used 6-12 months of environment interactions, our method needs only 5 weeks. We evaluate our method in a large simulation study using the literature standard BOPTEST framework and proximal policy algorithm (PPO) as the RL algorithm. Our results show cost-saving potentials of 5.3% to 17.0% in a hypothetical deployment scenario. Our work is a significant step towards making real-world deployment of RL algorithms in HVAC control practically viable.

Problem

Research questions and friction points this paper is trying to address.

data-efficient reinforcement learning

HVAC control

model-based reinforcement learning

building energy management

surrogate models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Counterfactual Surrogate Model

Model-based Reinforcement Learning

Data Efficiency