Unifying UAV Cross-View Geo-Localization via 3D Geometric Perception

📅 2026-04-02

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work addresses the challenge of cross-view geolocalization between oblique drone images and orthorectified satellite maps under GNSS-denied conditions, where significant viewpoint discrepancies hinder accurate matching. The authors propose a geometry-aware unified framework that explicitly models 3D scene geometry for the first time, leveraging a Visual Geometry-Guided Transformer (VGGT) to jointly perform multi-view 3D reconstruction and render virtual bird’s-eye views. Their approach integrates coarse-grained place retrieval and fine-grained 3-DoF pose regression within a single forward inference pass and introduces a novel satellite-level attention mechanism to effectively suppress interference among candidate locations. Evaluated on the reconstructed University-1652 and SUES-200 datasets, the method achieves meter-level localization accuracy, significantly outperforming existing approaches and demonstrating superior robustness and generalization in complex urban environments.

Technology Category

Application Category

📝 Abstract

Cross-view geo-localization for Unmanned Aerial Vehicles (UAVs) operating in GNSS-denied environments remains challenging due to the severe geometric discrepancy between oblique UAV imagery and orthogonal satellite maps. Most existing methods address this problem through a decoupled pipeline of place retrieval and pose estimation, implicitly treating perspective distortion as appearance noise rather than an explicit geometric transformation. In this work, we propose a geometry-aware UAV geo-localization framework that explicitly models the 3D scene geometry to unify coarse place recognition and fine-grained pose estimation within a single inference pipeline. Our approach reconstructs a local 3D scene from multi-view UAV image sequences using a Visual Geometry Grounded Transformer (VGGT), and renders a virtual Bird's-Eye View (BEV) representation that orthorectifies the UAV perspective to align with satellite imagery. This BEV serves as a geometric intermediary that enables robust cross-view retrieval and provides spatial priors for accurate 3 Degrees of Freedom (3-DoF) pose regression. To efficiently handle multiple location hypotheses, we introduce a Satellite-wise Attention Block that isolates the interaction between each satellite candidate and the reconstructed UAV scene, preventing inter-candidate interference while maintaining linear computational complexity. In addition, we release a recalibrated version of the University-1652 dataset with precise coordinate annotations and spatial overlap analysis, enabling rigorous evaluation of end-to-end localization accuracy. Extensive experiments on the refined University-1652 benchmark and SUES-200 demonstrate that our method significantly outperforms state-of-the-art baselines, achieving robust meter-level localization accuracy and improved generalization in complex urban environments.

Problem

Research questions and friction points this paper is trying to address.

UAV geo-localization

cross-view matching

geometric discrepancy

GNSS-denied environments

3D scene geometry

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D geometric perception

cross-view geo-localization

Visual Geometry Grounded Transformer