🤖 AI Summary
Accurate visual localization of unmanned aerial vehicles (UAVs) remains challenging in GNSS-denied environments, particularly due to significant cross-domain discrepancies between satellite imagery and UAV-captured aerial images—including large spatiotemporal gaps, drastic viewpoint differences, and heterogeneous modalities (e.g., visible-light vs. infrared).
Method: This paper proposes a cross-view image matching localization framework that first employs an advanced object detector to extract multi-scale salient instances, constructing a fine-grained object-level graph. A dedicated graph neural network (GNN) is then designed to jointly model intra- and inter-node relationships, coupled with a learnable node similarity metric that explicitly mitigates modality discrepancies.
Contribution/Results: Extensive experiments on public and real-world datasets demonstrate substantial improvements in cross-domain image retrieval and geolocalization accuracy. The method exhibits strong robustness and generalization even under large modality gaps, establishing a novel paradigm for precise UAV localization without GNSS support.
📝 Abstract
With the rapid growth of the low-altitude economy, UAVs have become crucial for measurement and tracking in patrol systems. However, in GNSS-denied areas, satellite-based localization methods are prone to failure. This paper presents a cross-view UAV localization framework that performs map matching via object detection, aimed at effectively addressing cross-temporal, cross-view, heterogeneous aerial image matching. In typical pipelines, UAV visual localization is formulated as an image-retrieval problem: features are extracted to build a localization map, and the pose of a query image is estimated by matching it to a reference database with known poses. Because publicly available UAV localization datasets are limited, many approaches recast localization as a classification task and rely on scene labels in these datasets to ensure accuracy. Other methods seek to reduce cross-domain differences using polar-coordinate reprojection, perspective transformations, or generative adversarial networks; however, they can suffer from misalignment, content loss, and limited realism. In contrast, we leverage modern object detection to accurately extract salient instances from UAV and satellite images, and integrate a graph neural network to reason about inter-image and intra-image node relationships. Using a fine-grained, graph-based node-similarity metric, our method achieves strong retrieval and localization performance. Extensive experiments on public and real-world datasets show that our approach handles heterogeneous appearance differences effectively and generalizes well, making it applicable to scenarios with larger modality gaps, such as infrared-visible image matching. Our dataset will be publicly available at the following URL: https://github.com/liutao23/ODGNNLoc.git.