A Highly Configurable Framework for Large-Scale Thermal Building Data Generation to drive Machine Learning Research

๐Ÿ“… 2025-11-29
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
High-quality, large-scale empirical data are scarce in machine learning research on building thermal dynamics. Method: This paper proposes a low-barrier, scalable synthetic data generation framework that integrates a Modelica-based single-zone thermal model with Functional Mock-up Unit (FMU) export capabilities, enabling fully automated Python-driven simulations without requiring domain expertise in building simulation. Contribution/Results: The framework significantly enhances data scale and configuration flexibility compared to existing tools. It generates a large-scale dataset suitable for transfer learning and validates fine-tuning across 486 data-driven models. Experimental results demonstrate superior effectiveness, generalizability, and scalability, establishing a robust data infrastructure for AI-driven building research.

Technology Category

Application Category

๐Ÿ“ Abstract
Data-driven modeling of building thermal dynamics is emerging as an increasingly important field of research for large-scale intelligent building control. However, research in data-driven modeling using machine learning (ML) techniques requires massive amounts of thermal building data, which is not easily available. Neither empirical public datasets nor existing data generators meet the needs of ML research in terms of data quality and quantity. Moreover, existing data generation approaches typically require expert knowledge in building simulation. To fill this gap, we present a thermal building data generation framework which we call BuilDa. BuilDa is designed to produce synthetic data of adequate quality and quantity for ML research. The framework does not require profound building simulation knowledge to generate large volumes of data. BuilDa uses a single-zone Modelica model that is exported as a Functional Mock-up Unit (FMU) and simulated in Python. We demonstrate BuilDa by generating data and utilizing it for a transfer learning study involving the fine-tuning of 486 data-driven models.
Problem

Research questions and friction points this paper is trying to address.

Generates synthetic thermal building data for machine learning research
Addresses lack of quality and quantity in existing building thermal datasets
Enables data creation without deep building simulation expertise
Innovation

Methods, ideas, or system contributions that make the work stand out.

Configurable framework for large-scale thermal data generation
Uses Modelica FMU exported to Python for simulation
Generates synthetic data without requiring expert simulation knowledge
๐Ÿ”Ž Similar Papers
No similar papers found.
T
Thomas Krug
Karlsruhe Institute of Technology, Germany
F
Fabian Raisch
Rosenheim Technical University of Applied Sciences, Germany
D
Dominik Aimer
Rosenheim Technical University of Applied Sciences, Germany
M
Markus Wirnsberger
Rosenheim Technical University of Applied Sciences, Germany
F
Ferdinand Sigg
Rosenheim Technical University of Applied Sciences, Germany
F
Felix Koch
Rosenheim Technical University of Applied Sciences, Germany
Benjamin Schรคfer
Benjamin Schรคfer
Karlsruhe Institute of Technology
Data-driven ModellingMachine LearningEnergy SystemsExplainable AIStochastic Systems
B
Benjamin Tischler
Rosenheim Technical University of Applied Sciences, Germany