Beyond Paired Data: Self-Supervised UAV Geo-Localization from Reference Imagery Alone

📅 2025-12-02

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

To address the limitations of existing cross-view geo-localization methods for UAVs in GNSS-denied environments—namely, their reliance on large-scale paired UAV-satellite image datasets, high data acquisition costs, and poor generalizability—this paper proposes the first self-supervised, satellite-only cross-view localization framework. Our method models the visual domain shift between satellite and UAV viewpoints without requiring paired samples, enabling a contrastive self-supervised learning paradigm. We introduce CAEVL, a lightweight contrastive autoencoder architecture, coupled with viewpoint-aware data augmentation strategies tailored to inter-modal geometric and appearance discrepancies. Evaluated on the newly released ViLD real-world UAV dataset, our approach achieves localization accuracy comparable to fully supervised paired-training baselines—even under zero-shot pairing—while significantly enhancing generalization in low-resource settings. This work establishes an efficient, scalable, and GNSS-free visual localization paradigm.

Technology Category

Application Category

📝 Abstract

Image-based localization in GNSS-denied environments is critical for UAV autonomy. Existing state-of-the-art approaches rely on matching UAV images to geo-referenced satellite images; however, they typically require large-scale, paired UAV-satellite datasets for training. Such data are costly to acquire and often unavailable, limiting their applicability. To address this challenge, we adopt a training paradigm that removes the need for UAV imagery during training by learning directly from satellite-view reference images. This is achieved through a dedicated augmentation strategy that simulates the visual domain shift between satellite and real-world UAV views. We introduce CAEVL, an efficient model designed to exploit this paradigm, and validate it on ViLD, a new and challenging dataset of real-world UAV images that we release to the community. Our method achieves competitive performance compared to approaches trained with paired data, demonstrating its effectiveness and strong generalization capabilities.

Problem

Research questions and friction points this paper is trying to address.

Enables UAV localization without GNSS using only satellite images

Eliminates need for paired UAV-satellite datasets during training

Simulates visual domain shift between satellite and UAV views

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised learning from satellite images alone

Augmentation simulates satellite-to-UAV domain shift

Efficient CAEVL model validated on real-world UAV dataset

🔎 Similar Papers

STHN: Deep Homography Estimation for UAV Thermal Geo-Localization With Satellite Imagery