🤖 AI Summary
Urban energy modeling is often constrained by the scarcity of building-level attribute data, resulting in limited simulation fidelity and scalability. To address this, we propose the first conditional diffusion generative model tailored for building attribute imputation—innovatively adapting diffusion mechanisms to tabular building data, enabling conditional generation of mixed discrete and continuous features. Trained on a dataset of 2.2 million residential buildings, the model demonstrates high distributional fidelity in Baltimore: generated attributes closely match ground-truth distributions (Kolmogorov–Smirnov test p > 0.95). This substantially improves input completeness for energy simulations and enhances downstream prediction accuracy. Our approach establishes a scalable, high-fidelity data augmentation paradigm for large-scale urban energy modeling.
📝 Abstract
Understanding current energy consumption behavior in communities is critical for informing future energy use decisions and enabling efficient energy management. Urban energy models, which are used to simulate these energy use patterns, require large datasets with detailed building characteristics for accurate outcomes. However, such detailed characteristics at the individual building level are often unknown and costly to acquire, or unavailable. Through this work, we propose using a generative modeling approach to generate realistic building attributes to fill in the data gaps and finally provide complete characteristics as inputs to energy models. Our model learns complex, building-level patterns from training on a large-scale residential building stock model containing 2.2 million buildings. We employ a tabular diffusion-based framework that is designed to handle heterogeneous (discrete and continuous) features in tabular building data, such as occupancy, floor area, heating, cooling, and other equipment details. We develop a capability for conditional diffusion, enabling the imputation of missing building characteristics conditioned on known attributes. We conduct a comprehensive validation of our conditional diffusion model, firstly by comparing the generated conditional distributions against the underlying data distribution, and secondly, by performing a case study for a Baltimore residential region, showing the practical utility of our approach. Our work is one of the first to demonstrate the potential of generative modeling to accelerate building energy modeling workflows.