🤖 AI Summary
This study addresses critical challenges—scalability, safety, interpretability, and sample efficiency—in deploying reinforcement learning (RL) and model predictive control (MPC) for residential HVAC systems. For the first time, it conducts a one-month closed-loop comparative experiment in a real residential setting. A physics-informed MPC and a model-based RL (MBRL) approach, leveraging learned dynamic system models, are co-deployed with a heat pump to jointly optimize energy efficiency and thermal comfort. Results show RL achieves 22% energy savings—marginally exceeding MPC’s 20%—but MPC delivers superior energy efficiency at equivalent comfort levels. The study quantifies RL’s advantage in reducing modeling effort while empirically identifying, for the first time, key practical bottlenecks: unsafe policy initialization, instability during online adaptation, and actuation deviation. These findings establish a crucial benchmark and provide actionable design insights for operationalizing intelligent building control algorithms.
📝 Abstract
Advanced control strategies like Model Predictive Control (MPC) offer significant energy savings for HVAC systems but often require substantial engineering effort, limiting scalability. Reinforcement Learning (RL) promises greater automation and adaptability, yet its practical application in real-world residential settings remains largely undemonstrated, facing challenges related to safety, interpretability, and sample efficiency. To investigate these practical issues, we performed a direct comparison of an MPC and a model-based RL controller, with each controller deployed for a one-month period in an occupied house with a heat pump system in West Lafayette, Indiana. This investigation aimed to explore scalability of the chosen RL and MPC implementations while ensuring safety and comparability. The advanced controllers were evaluated against each other and against the existing controller. RL achieved substantial energy savings (22% relative to the existing controller), slightly exceeding MPC's savings (20%), albeit with modestly higher occupant discomfort. However, when energy savings were normalized for the level of comfort provided, MPC demonstrated superior performance. This study's empirical results show that while RL reduces engineering overhead, it introduces practical trade-offs in model accuracy and operational robustness. The key lessons learned concern the difficulties of safe controller initialization, navigating the mismatch between control actions and their practical implementation, and maintaining the integrity of online learning in a live environment. These insights pinpoint the essential research directions needed to advance RL from a promising concept to a truly scalable HVAC control solution.