Fusion of Pervasive RF Data with Spatial Images via Vision Transformers for Enhanced Mapping in Smart Cities

📅 2025-07-31

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

To address the insufficient mapping accuracy of open-source maps—caused by human annotation errors and delayed dynamic updates—this paper proposes a multimodal vision transformer framework that jointly fuses radio frequency (RF) path loss measurements and open-source map imagery. It is the first work to incorporate DINOv2 for urban building mapping, enabling joint modeling of spatial structural priors and RF signal propagation characteristics. The method achieves end-to-end building layout reconstruction solely from aggregated RF path loss data, without requiring expensive remote sensing inputs or manual annotations. Evaluated on a synthetic dataset, it achieves a macro-IoU of 65.3%, substantially outperforming erroneous-map (40.1%), RF-only (37.3%), and non-AI fusion baselines (42.2%). It also surpasses all baselines in Jaccard index, Hausdorff distance, and Chamfer distance. This work establishes a new paradigm for low-cost, high-robustness environmental perception in smart cities.

Technology Category

Application Category

📝 Abstract

Environment mapping is an important computing task for a wide range of smart city applications, including autonomous navigation, wireless network operations and extended reality environments. Conventional smart city mapping techniques, such as satellite imagery, LiDAR scans, and manual annotations, often suffer from limitations related to cost, accessibility and accuracy. Open-source mapping platforms have been widely utilized in artificial intelligence applications for environment mapping, serving as a source of ground truth. However, human errors and the evolving nature of real-world environments introduce biases that can negatively impact the performance of neural networks trained on such data. In this paper, we present a deep learning-based approach that integrates the DINOv2 architecture to improve building mapping by combining maps from open-source platforms with radio frequency (RF) data collected from multiple wireless user equipments and base stations. Our approach leverages a vision transformer-based architecture to jointly process both RF and map modalities within a unified framework, effectively capturing spatial dependencies and structural priors for enhanced mapping accuracy. For the evaluation purposes, we employ a synthetic dataset co-produced by Huawei. We develop and train a model that leverages only aggregated path loss information to tackle the mapping problem. We measure the results according to three performance metrics which capture different qualities: (i) The Jaccard index, also known as intersection over union (IoU), (ii) the Hausdorff distance, and (iii) the Chamfer distance. Our design achieves a macro IoU of 65.3%, significantly surpassing (i) the erroneous maps baseline, which yields 40.1%, (ii) an RF-only method from the literature, which yields 37.3%, and (iii) a non-AI fusion baseline that we designed which yields 42.2%.

Problem

Research questions and friction points this paper is trying to address.

Enhancing smart city mapping accuracy using RF and spatial data fusion

Overcoming limitations of conventional mapping techniques with deep learning

Leveraging vision transformers for unified RF and map data processing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fuses RF data with spatial images via transformers

Uses DINOv2 for enhanced building mapping accuracy

Leverages aggregated path loss for mapping solution

🔎 Similar Papers

GeoTransformer: Enhancing Urban Forecasting with Dependency Retrieval and Geospatial Attention