Coarse-to-Fine Monocular Re-Localization in OpenStreetMap via Semantic Alignment

📅 2026-03-02

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the challenges of large cross-modal discrepancies and high computational costs in global matching for lightweight, privacy-preserving monocular relocalization using OpenStreetMap (OSM). To this end, we propose a semantic-aware hierarchical coarse-to-fine search framework. Our approach is the first to leverage the semantic perception capability of DINO-ViT for OSM-based relocalization, aligning semantic features between query images and map data while replacing exhaustive global matching with an efficient hierarchical retrieval strategy. Trained on a single dataset, the method achieves a 3° orientation recall that surpasses the 5° performance of current state-of-the-art approaches, significantly improving both localization accuracy and computational efficiency while maintaining a lightweight map representation.

Technology Category

Application Category

📝 Abstract

Monocular re-localization plays a crucial role in enabling intelligent agents to achieve human-like perception. However, traditional methods rely on dense maps, which face scalability limitations and privacy risks. OpenStreetMap (OSM), as a lightweight map that protects privacy, offers semantic and geometric information with global scalability. Nonetheless, there are still challenges in using OSM for localization: the inherent cross-modal discrepancies between natural images and OSM, as well as the high computational cost of global map-based localization. In this paper, we propose a hierarchical search framework with semantic alignment for localization in OSM. First, the semantic awareness capability of DINO-ViT is utilised to deconstruct visual elements to establish semantic relationships with OSM. Second, a coarse-to-fine search paradigm is designed to replace global dense matching, enabling efficient progressive refinement. Extensive experiments demonstrate that our method significantly improves both localization accuracy and speed. When trained on a single dataset, the 3° orientation recall of our method even outperforms the 5° recall of state-of-the-art methods.

Problem

Research questions and friction points this paper is trying to address.

monocular re-localization

OpenStreetMap

cross-modal discrepancy

computational cost

semantic alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

coarse-to-fine localization

semantic alignment

OpenStreetMap