🤖 AI Summary
To address the insufficient mapping accuracy of open-source maps—caused by human annotation errors and delayed dynamic updates—this paper proposes a multimodal vision transformer framework that jointly fuses radio frequency (RF) path loss measurements and open-source map imagery. It is the first work to incorporate DINOv2 for urban building mapping, enabling joint modeling of spatial structural priors and RF signal propagation characteristics. The method achieves end-to-end building layout reconstruction solely from aggregated RF path loss data, without requiring expensive remote sensing inputs or manual annotations. Evaluated on a synthetic dataset, it achieves a macro-IoU of 65.3%, substantially outperforming erroneous-map (40.1%), RF-only (37.3%), and non-AI fusion baselines (42.2%). It also surpasses all baselines in Jaccard index, Hausdorff distance, and Chamfer distance. This work establishes a new paradigm for low-cost, high-robustness environmental perception in smart cities.
📝 Abstract
Environment mapping is an important computing task for a wide range of smart city applications, including autonomous navigation, wireless network operations and extended reality environments. Conventional smart city mapping techniques, such as satellite imagery, LiDAR scans, and manual annotations, often suffer from limitations related to cost, accessibility and accuracy. Open-source mapping platforms have been widely utilized in artificial intelligence applications for environment mapping, serving as a source of ground truth. However, human errors and the evolving nature of real-world environments introduce biases that can negatively impact the performance of neural networks trained on such data. In this paper, we present a deep learning-based approach that integrates the DINOv2 architecture to improve building mapping by combining maps from open-source platforms with radio frequency (RF) data collected from multiple wireless user equipments and base stations. Our approach leverages a vision transformer-based architecture to jointly process both RF and map modalities within a unified framework, effectively capturing spatial dependencies and structural priors for enhanced mapping accuracy. For the evaluation purposes, we employ a synthetic dataset co-produced by Huawei. We develop and train a model that leverages only aggregated path loss information to tackle the mapping problem. We measure the results according to three performance metrics which capture different qualities: (i) The Jaccard index, also known as intersection over union (IoU), (ii) the Hausdorff distance, and (iii) the Chamfer distance. Our design achieves a macro IoU of 65.3%, significantly surpassing (i) the erroneous maps baseline, which yields 40.1%, (ii) an RF-only method from the literature, which yields 37.3%, and (iii) a non-AI fusion baseline that we designed which yields 42.2%.