SkyLink: Unifying Street-Satellite Geo-Localization via UAV-Mediated 3D Scene Alignment

📅 2025-09-29

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Cross-view geolocalization suffers from semantic degradation caused by extreme viewpoint discrepancies, limiting the performance of conventional direct feature-matching approaches. To address this, we propose leveraging multi-scale UAV-captured 3D scenes as an intermediate semantic bridge between street-level and satellite imagery. Our method employs self-supervised and cross-view contrastive learning to enhance feature alignment; integrates a retrieval-augmented module to improve street-view quality; introduces a patch-aware feature aggregation mechanism to strengthen local consistency; and incorporates multi-scale UAV-derived 3D geometric priors to enable robust cross-modal matching. Evaluated on the University-1652 benchmark, our approach achieves a Recall@1 of 25.75%, demonstrating significant improvements in generalization and robustness under severe viewpoint variations. This work establishes a novel, interpretable, and geometry-aware paradigm for cross-view geolocalization.

Technology Category

Application Category

📝 Abstract

Cross-view geo-localization aims at establishing location correspondences between different viewpoints. Existing approaches typically learn cross-view correlations through direct feature similarity matching, often overlooking semantic degradation caused by extreme viewpoint disparities. To address this unique problem, we focus on robust feature retrieval under viewpoint variation and propose the novel SkyLink method. We firstly utilize the Google Retrieval Enhancement Module to perform data enhancement on street images, which mitigates the occlusion of the key target due to restricted street viewpoints. The Patch-Aware Feature Aggregation module is further adopted to emphasize multiple local feature aggregations to ensure the consistent feature extraction across viewpoints. Meanwhile, we integrate the 3D scene information constructed from multi-scale UAV images as a bridge between street and satellite viewpoints, and perform feature alignment through self-supervised and cross-view contrastive learning. Experimental results demonstrate robustness and generalization across diverse urban scenarios, which achieve 25.75$%$ Recall@1 accuracy on University-1652 in the UAVM2025 Challenge. Code will be released at https://github.com/HRT00/CVGL-3D.

Problem

Research questions and friction points this paper is trying to address.

Addresses cross-view geo-localization with extreme viewpoint disparities

Mitigates semantic degradation in street-satellite image matching

Unifies street-satellite localization via UAV-mediated 3D scene alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

UAV-mediated 3D scene alignment bridges street-satellite views

Google Retrieval Enhancement mitigates street view occlusion

Patch-Aware Feature Aggregation ensures cross-view consistency

🔎 Similar Papers

No similar papers found.

Authors to Follow