🤖 AI Summary
This study addresses the limitations of existing urban building energy modeling approaches, which are predominantly confined to predictive paradigms and lack high-quality, satellite-aligned data on building energy use and height. To overcome these challenges, the authors propose SENSE, a unified generative framework that, for the first time, enables generative modeling of building energy consumption at the urban functional level. Leveraging a controllable diffusion model and a large vision model, SENSE jointly generates high-fidelity, physically consistent satellite imagery along with corresponding building energy and height maps in latent space, conditioned on road networks and urban density. Requiring less than 20% labeled data, SENSE meets ASHRAE standards across four cities, improves IoU by 10% on downstream tasks, and reduces NMBE and CVRMSE by 3%–11% and 1%–9%, respectively, significantly outperforming state-of-the-art methods.
📝 Abstract
Urban Building Energy Modeling plays a critical role in achieving the United Nations' Sustainable Development Goals 7 and 11. Although existing studies based on satellite imagery and deep learning have achieved remarkable progress, many challenges exist: most existing studies are inherently predictive, failing to reflect the generative nature of urban planning; although generative AI and diffusion models have seen explosive growth in satellite imagery, they lack the urban functional generation (e.g., energy layer); third, aligned high-quality high-resolution building energy data with satellite imagery is limited and scarce. Here we propose SENSE (Satellite-based ENergy Synthesis for Sustainable Environment), a unified generative UBEM framework that jointly synthesizes realistic urban satellite imagery and aligned high-quality building energy consumption and height maps. By conditioning on road networks and urban density metrics, SENSE, based on a controllable diffusion model, leverages the knowledge learned by large vision models to generate urban building energy consumption and height information (annotations) in the latent space. Experiments across four cities (New York City, Boston, Lyon, Busan) demonstrate that SENSE achieves high visual fidelity and strong physical consistency, satisfying the ASHRAE standard metric. Experiments demonstrate that SENSE can generate enough annotated synthetic data using less than 20% labeled energy data, boosting downstream prediction performance by 10% IoU. Compared to SOTA urban energy prediction methods, SENSE significantly reduced prediction error (reduced 3%-11% NMBE and 1%-9% CVRMSE). This study offers an energy-efficiency urban planning and physical generation solution for urban science, energy science and building science. The dataset and code: https://huggingface.co/datasets/skl24/MUSE and https://github.com/kailaisun/GenAI4Urban-Energy/.