SYNBUILD-3D: A large, multi-modal, and semantically rich synthetic dataset of 3D building models at Level of Detail 4

📅 2025-08-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing research is hindered by the absence of large-scale, publicly available 3D building datasets with fine-grained semantic annotations, impeding high-precision generative AI–driven architectural modeling. To address this, we introduce the first large-scale, multimodal synthetic LoD4 building dataset comprising 6.2 million residential structures, each annotated with semantically consistent wireframe models, floorplan images, and LiDAR-like roof point clouds—strictly aligned under geometric-semantic consistency constraints. Our methodology integrates rule-based architectural modeling, hierarchical semantic mapping, and physics-informed point cloud simulation to ensure cross-modal alignment and realism. The dataset and benchmark code are fully open-sourced. This work significantly advances generative architectural modeling, urban digital twin construction, and high-fidelity energy simulation.

Technology Category

Application Category

📝 Abstract
3D building models are critical for applications in architecture, energy simulation, and navigation. Yet, generating accurate and semantically rich 3D buildings automatically remains a major challenge due to the lack of large-scale annotated datasets in the public domain. Inspired by the success of synthetic data in computer vision, we introduce SYNBUILD-3D, a large, diverse, and multi-modal dataset of over 6.2 million synthetic 3D residential buildings at Level of Detail (LoD) 4. In the dataset, each building is represented through three distinct modalities: a semantically enriched 3D wireframe graph at LoD 4 (Modality I), the corresponding floor plan images (Modality II), and a LiDAR-like roof point cloud (Modality III). The semantic annotations for each building wireframe are derived from the corresponding floor plan images and include information on rooms, doors, and windows. Through its tri-modal nature, future work can use SYNBUILD-3D to develop novel generative AI algorithms that automate the creation of 3D building models at LoD 4, subject to predefined floor plan layouts and roof geometries, while enforcing semantic-geometric consistency. Dataset and code samples are publicly available at https://github.com/kdmayer/SYNBUILD-3D.
Problem

Research questions and friction points this paper is trying to address.

Lack of large-scale annotated 3D building datasets
Automating creation of semantically rich 3D buildings
Generating accurate LoD4 models with geometric consistency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthetic dataset with multi-modal 3D building representations
Semantic annotations derived from floor plan images
Tri-modal data enabling generative AI algorithms
🔎 Similar Papers
No similar papers found.