๐ค AI Summary
Cross-view localization aims to align ground-level images with overhead satellite imagery to estimate 3-DoF camera pose, yet existing fully supervised methods heavily rely on costly and scarce ground-truth pose annotations. This paper proposes a geometry-guided weakly supervised self-distillation framework: (i) a field-of-view (FoV) masking mechanism enables local feature self-supervision within a teacherโstudent architecture; (ii) a geometric consistency loss enforces alignment in feature space; and (iii) a relative orientation estimation module operates without precise position labels. The method significantly reduces dependence on accurate pose supervision while enhancing modeling of salient structural cues. It achieves state-of-the-art performance across multiple benchmarks, supports both panoramic and narrow-field inputs, yields lower pose uncertainty, exhibits strong generalization, and demonstrates practical deployability.
๐ Abstract
Cross-view localization, the task of estimating a camera's 3-degrees-of-freedom (3-DoF) pose by aligning ground-level images with satellite images, is crucial for large-scale outdoor applications like autonomous navigation and augmented reality. Existing methods often rely on fully supervised learning, which requires costly ground-truth pose annotations. In this work, we propose GeoDistill, a Geometry guided weakly supervised self distillation framework that uses teacher-student learning with Field-of-View (FoV)-based masking to enhance local feature learning for robust cross-view localization. In GeoDistill, the teacher model localizes a panoramic image, while the student model predicts locations from a limited FoV counterpart created by FoV-based masking. By aligning the student's predictions with those of the teacher, the student focuses on key features like lane lines and ignores textureless regions, such as roads. This results in more accurate predictions and reduced uncertainty, regardless of whether the query images are panoramas or limited FoV images. Our experiments show that GeoDistill significantly improves localization performance across different frameworks. Additionally, we introduce a novel orientation estimation network that predicts relative orientation without requiring precise planar position ground truth. GeoDistill provides a scalable and efficient solution for real-world cross-view localization challenges. Code and model can be found at https://github.com/tongshw/GeoDistill.