GeoDistill: Geometry-Guided Self-Distillation for Weakly Supervised Cross-View Localization

📅 2025-07-14

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

Cross-view localization aims to align ground-level images with overhead satellite imagery to estimate 3-DoF camera pose, yet existing fully supervised methods heavily rely on costly and scarce ground-truth pose annotations. This paper proposes a geometry-guided weakly supervised self-distillation framework: (i) a field-of-view (FoV) masking mechanism enables local feature self-supervision within a teacher–student architecture; (ii) a geometric consistency loss enforces alignment in feature space; and (iii) a relative orientation estimation module operates without precise position labels. The method significantly reduces dependence on accurate pose supervision while enhancing modeling of salient structural cues. It achieves state-of-the-art performance across multiple benchmarks, supports both panoramic and narrow-field inputs, yields lower pose uncertainty, exhibits strong generalization, and demonstrates practical deployability.

Technology Category

Application Category

📝 Abstract

Cross-view localization, the task of estimating a camera's 3-degrees-of-freedom (3-DoF) pose by aligning ground-level images with satellite images, is crucial for large-scale outdoor applications like autonomous navigation and augmented reality. Existing methods often rely on fully supervised learning, which requires costly ground-truth pose annotations. In this work, we propose GeoDistill, a Geometry guided weakly supervised self distillation framework that uses teacher-student learning with Field-of-View (FoV)-based masking to enhance local feature learning for robust cross-view localization. In GeoDistill, the teacher model localizes a panoramic image, while the student model predicts locations from a limited FoV counterpart created by FoV-based masking. By aligning the student's predictions with those of the teacher, the student focuses on key features like lane lines and ignores textureless regions, such as roads. This results in more accurate predictions and reduced uncertainty, regardless of whether the query images are panoramas or limited FoV images. Our experiments show that GeoDistill significantly improves localization performance across different frameworks. Additionally, we introduce a novel orientation estimation network that predicts relative orientation without requiring precise planar position ground truth. GeoDistill provides a scalable and efficient solution for real-world cross-view localization challenges. Code and model can be found at https://github.com/tongshw/GeoDistill.

Problem

Research questions and friction points this paper is trying to address.

Estimating camera pose without costly ground-truth annotations

Enhancing cross-view localization using self-distillation and FoV masking

Predicting orientation without precise planar position ground truth

Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometry-guided self-distillation for localization

FoV-based masking enhances local feature learning

Novel orientation estimation without precise ground truth

🔎 Similar Papers

No similar papers found.

Bosch Group

Hildesheim, NDS, DE

Research Engineer / Scientist (SLAM)

World Labs

$250,000-$350,000 base salary (good-faith estimate for San Francisco Bay Area upon hire; actual offer based on experience, skills, and qualifications)

San Francisco / San Francisco Office, San Francisco, California, United States

Authors to Follow