đ¤ AI Summary
Urban Digital Twin (UDT) development has long been hindered by the absence of end-to-end benchmark datasets covering the full pipelineâfrom data acquisition and high-fidelity modeling to dynamic updating and downstream task validation. Existing datasets are typically limited to single modalities or isolated processing stages. To address this, we introduce the first large-scale, multimodal UDT benchmark: a 100,000 m² urban area featuring georeferenced and semantically aligned Level-of-Detail 3 (LoD3) 3D models, alongside 32 heterogeneous observation modalitiesâincluding ground-level, mobile, aerial, and satellite imageryâtotaling 767 GB. This benchmark establishes the first unified indoorâoutdoor georegistration, cross-modal semantic alignment, and LoD3 annotation framework. The dataset is publicly released and demonstrates significant improvements: +2.1 dB PSNR in NeRF novel-view synthesis and +8.3% IoU in building reconstruction. It further enables diverse downstream tasksâincluding solar potential analysis and point cloud segmentationâthereby filling a critical gap in standardized UDT evaluation.
đ Abstract
Urban Digital Twins (UDTs) have become essential for managing cities and integrating complex, heterogeneous data from diverse sources. Creating UDTs involves challenges at multiple process stages, including acquiring accurate 3D source data, reconstructing high-fidelity 3D models, maintaining models' updates, and ensuring seamless interoperability to downstream tasks. Current datasets are usually limited to one part of the processing chain, hampering comprehensive UDTs validation. To address these challenges, we introduce the first comprehensive multimodal Urban Digital Twin benchmark dataset: TUM2TWIN. This dataset includes georeferenced, semantically aligned 3D models and networks along with various terrestrial, mobile, aerial, and satellite observations boasting 32 data subsets over roughly 100,000 $m^2$ and currently 767 GB of data. By ensuring georeferenced indoor-outdoor acquisition, high accuracy, and multimodal data integration, the benchmark supports robust analysis of sensors and the development of advanced reconstruction methods. Additionally, we explore downstream tasks demonstrating the potential of TUM2TWIN, including novel view synthesis of NeRF and Gaussian Splatting, solar potential analysis, point cloud semantic segmentation, and LoD3 building reconstruction. We are convinced this contribution lays a foundation for overcoming current limitations in UDT creation, fostering new research directions and practical solutions for smarter, data-driven urban environments. The project is available under: https://tum2t.win