GeoSURGE: Geo-localization using Semantic Fusion with Hierarchy of Geographic Embeddings

📅 2025-10-01

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Global visual geolocalization aims to infer the real-world geographic coordinates of an image solely from its visual content, with the core challenge being effective alignment between visual and geographic representations. To address this, we propose Hierarchical Geographic Embedding (HGE), the first approach to model geographic priors as a multi-granularity hierarchical structure. We further design a semantic segmentation-guided feature fusion module that jointly encodes appearance features and scene-level semantics to enable fine-grained visual–geographic alignment. Evaluated on five standard benchmarks across 25 metrics, our method outperforms existing state-of-the-art methods and mainstream large vision-language models on 22 metrics. It significantly improves robustness and accuracy in cross-region and multi-scale localization, demonstrating superior generalization under varying geographic and visual conditions.

Technology Category

Application Category

📝 Abstract

Worldwide visual geo-localization seeks to determine the geographic location of an image anywhere on Earth using only its visual content. Learned representations of geography for visual geo-localization remain an active research topic despite much progress. We formulate geo-localization as aligning the visual representation of the query image with a learned geographic representation. Our novel geographic representation explicitly models the world as a hierarchy of geographic embeddings. Additionally, we introduce an approach to efficiently fuse the appearance features of the query image with its semantic segmentation map, forming a robust visual representation. Our main experiments demonstrate improved all-time bests in 22 out of 25 metrics measured across five benchmark datasets compared to prior state-of-the-art (SOTA) methods and recent Large Vision-Language Models (LVLMs). Additional ablation studies support the claim that these gains are primarily driven by the combination of geographic and visual representations.

Problem

Research questions and friction points this paper is trying to address.

Aligning visual image representations with learned geographic embeddings for localization

Modeling world hierarchy through geographic embeddings for improved geo-localization

Fusing appearance features with semantic segmentation for robust visual representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical geographic embeddings model world structure

Semantic fusion integrates appearance with segmentation maps

Combined geographic-visual representation outperforms prior methods

🔎 Similar Papers

No similar papers found.