SkyLink: Unifying Street-Satellite Geo-Localization via UAV-Mediated 3D Scene Alignment

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Cross-view geolocalization suffers from semantic degradation caused by extreme viewpoint discrepancies, limiting the performance of conventional direct feature-matching approaches. To address this, we propose leveraging multi-scale UAV-captured 3D scenes as an intermediate semantic bridge between street-level and satellite imagery. Our method employs self-supervised and cross-view contrastive learning to enhance feature alignment; integrates a retrieval-augmented module to improve street-view quality; introduces a patch-aware feature aggregation mechanism to strengthen local consistency; and incorporates multi-scale UAV-derived 3D geometric priors to enable robust cross-modal matching. Evaluated on the University-1652 benchmark, our approach achieves a Recall@1 of 25.75%, demonstrating significant improvements in generalization and robustness under severe viewpoint variations. This work establishes a novel, interpretable, and geometry-aware paradigm for cross-view geolocalization.

Technology Category

Application Category

📝 Abstract
Cross-view geo-localization aims at establishing location correspondences between different viewpoints. Existing approaches typically learn cross-view correlations through direct feature similarity matching, often overlooking semantic degradation caused by extreme viewpoint disparities. To address this unique problem, we focus on robust feature retrieval under viewpoint variation and propose the novel SkyLink method. We firstly utilize the Google Retrieval Enhancement Module to perform data enhancement on street images, which mitigates the occlusion of the key target due to restricted street viewpoints. The Patch-Aware Feature Aggregation module is further adopted to emphasize multiple local feature aggregations to ensure the consistent feature extraction across viewpoints. Meanwhile, we integrate the 3D scene information constructed from multi-scale UAV images as a bridge between street and satellite viewpoints, and perform feature alignment through self-supervised and cross-view contrastive learning. Experimental results demonstrate robustness and generalization across diverse urban scenarios, which achieve 25.75$%$ Recall@1 accuracy on University-1652 in the UAVM2025 Challenge. Code will be released at https://github.com/HRT00/CVGL-3D.
Problem

Research questions and friction points this paper is trying to address.

Addresses cross-view geo-localization with extreme viewpoint disparities
Mitigates semantic degradation in street-satellite image matching
Unifies street-satellite localization via UAV-mediated 3D scene alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

UAV-mediated 3D scene alignment bridges street-satellite views
Google Retrieval Enhancement mitigates street view occlusion
Patch-Aware Feature Aggregation ensures cross-view consistency
🔎 Similar Papers
No similar papers found.
Hongyang Zhang
Hongyang Zhang
Assistant Professor of Computer Science, University of Waterloo
Machine LearningInference AccelerationAI Security
Y
Yinhao Liu
Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, Xiamen, Fujian, China
Z
Zhenyu Kuang
School of Electronic and Information Engineering, Foshan University, Foshan, Guangdong, China