🤖 AI Summary
Cross-domain HVAC energy consumption forecasting suffers from data scarcity, strong heterogeneity across building environments, and spurious correlations that induce overfitting. To address these challenges, we propose CaberNet—the first causal representation learning framework specifically designed for HVAC energy prediction. CaberNet identifies causal features via global feature gating and self-supervised Bernoulli regularization, while enforcing domain-invariant Markov blanket representations through domain-balanced training and latent factor independence constraints—without requiring expert priors or strong structural assumptions. Integrating deep sequential modeling with self-supervised learning, CaberNet is evaluated on three real-world building datasets spanning markedly distinct climate zones. It achieves a 22.9% reduction in normalized mean squared error over the best baseline, demonstrating substantial improvements in cross-domain generalization and model interpretability.
📝 Abstract
Cross-domain HVAC energy prediction is essential for scalable building energy management, particularly because collecting extensive labeled data for every new building is both costly and impractical. Yet, this task remains highly challenging due to the scarcity and heterogeneity of data across different buildings, climate zones, and seasonal patterns. In particular, buildings situated in distinct climatic regions introduce variability that often leads existing methods to overfit to spurious correlations, rely heavily on expert intervention, or compromise on data diversity. To address these limitations, we propose CaberNet, a causal and interpretable deep sequence model that learns invariant (Markov blanket) representations for robust cross-domain prediction. In a purely data-driven fashion and without requiring any prior knowledge, CaberNet integrates i) a global feature gate trained with a self-supervised Bernoulli regularization to distinguish superior causal features from inferior ones, and ii) a domain-wise training scheme that balances domain contributions, minimizes cross-domain loss variance, and promotes latent factor independence. We evaluate CaberNet on real-world datasets collected from three buildings located in three climatically diverse cities, and it consistently outperforms all baselines, achieving a 22.9% reduction in normalized mean squared error (NMSE) compared to the best benchmark. Our code is available at https://github.com/rickzky1001/CaberNet-CRL.