GeoDistill: Geometry-Guided Self-Distillation for Weakly Supervised Cross-View Localization

๐Ÿ“… 2025-07-14
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Cross-view localization aims to align ground-level images with overhead satellite imagery to estimate 3-DoF camera pose, yet existing fully supervised methods heavily rely on costly and scarce ground-truth pose annotations. This paper proposes a geometry-guided weakly supervised self-distillation framework: (i) a field-of-view (FoV) masking mechanism enables local feature self-supervision within a teacherโ€“student architecture; (ii) a geometric consistency loss enforces alignment in feature space; and (iii) a relative orientation estimation module operates without precise position labels. The method significantly reduces dependence on accurate pose supervision while enhancing modeling of salient structural cues. It achieves state-of-the-art performance across multiple benchmarks, supports both panoramic and narrow-field inputs, yields lower pose uncertainty, exhibits strong generalization, and demonstrates practical deployability.

Technology Category

Application Category

๐Ÿ“ Abstract
Cross-view localization, the task of estimating a camera's 3-degrees-of-freedom (3-DoF) pose by aligning ground-level images with satellite images, is crucial for large-scale outdoor applications like autonomous navigation and augmented reality. Existing methods often rely on fully supervised learning, which requires costly ground-truth pose annotations. In this work, we propose GeoDistill, a Geometry guided weakly supervised self distillation framework that uses teacher-student learning with Field-of-View (FoV)-based masking to enhance local feature learning for robust cross-view localization. In GeoDistill, the teacher model localizes a panoramic image, while the student model predicts locations from a limited FoV counterpart created by FoV-based masking. By aligning the student's predictions with those of the teacher, the student focuses on key features like lane lines and ignores textureless regions, such as roads. This results in more accurate predictions and reduced uncertainty, regardless of whether the query images are panoramas or limited FoV images. Our experiments show that GeoDistill significantly improves localization performance across different frameworks. Additionally, we introduce a novel orientation estimation network that predicts relative orientation without requiring precise planar position ground truth. GeoDistill provides a scalable and efficient solution for real-world cross-view localization challenges. Code and model can be found at https://github.com/tongshw/GeoDistill.
Problem

Research questions and friction points this paper is trying to address.

Estimating camera pose without costly ground-truth annotations
Enhancing cross-view localization using self-distillation and FoV masking
Predicting orientation without precise planar position ground truth
Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometry-guided self-distillation for localization
FoV-based masking enhances local feature learning
Novel orientation estimation without precise ground truth
๐Ÿ”Ž Similar Papers
No similar papers found.