🤖 AI Summary
This work addresses radio map estimation under extremely sparse spatial sampling (only 0.01% of pixels observable) in realistic scenarios. We propose a multi-granularity Transformer architecture featuring two novel attention mechanisms: dual-stream self-attention (DSA) to model fine-grained pixel-level signal correlations, and cross-stream cross-attention (CCA) to capture coarse-grained building geometry at the block level; these are jointly optimized via multi-scale feature fusion. The architecture enables zero-shot generalization and maintains robustness even at ultra-low sampling rates. Evaluated on the RadioMapSeer benchmark, our method achieves state-of-the-art accuracy while incurring the lowest computational cost—demonstrating superior trade-offs among reconstruction fidelity, inference efficiency, and cross-scenario generalizability.
📝 Abstract
The task of radio map estimation aims to generate a dense representation of electromagnetic spectrum quantities, such as the received signal strength at each grid point within a geographic region, based on measurements from a subset of spatially distributed nodes (represented as pixels). Recently, deep vision models such as the U-Net have been adapted to radio map estimation, whose effectiveness can be guaranteed with sufficient spatial observations (typically 0.01% to 1% of pixels) in each map, to model local dependency of observed signal power. However, such a setting of sufficient measurements can be less practical in real-world scenarios, where extreme sparsity in spatial sampling can be widely encountered. To address this challenge, we propose RadioFormer, a novel multiple-granularity transformer designed to handle the constraints posed by spatial sparse observations. Our RadioFormer, through a dual-stream self-attention (DSA) module, can respectively discover the correlation of pixel-wise observed signal power and also learn patch-wise buildings' geometries in a style of multiple granularities, which are integrated into multi-scale representations of radio maps by a cross stream cross-attention (CCA) module. Extensive experiments on the public RadioMapSeer dataset demonstrate that RadioFormer outperforms state-of-the-art methods in radio map estimation while maintaining the lowest computational cost. Furthermore, the proposed approach exhibits exceptional generalization capabilities and robust zero-shot performance, underscoring its potential to advance radio map estimation in a more practical setting with very limited observation nodes.