DualGeo: A Dual-View Framework for Worldwide Image Geo-localization

📅 2026-04-28

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

This work addresses the challenges of visual feature instability caused by environmental changes and insufficient post-processing of candidate locations in global image geolocalization. The authors propose a two-stage framework: in the first stage, bidirectional cross-attention fuses image and semantic segmentation features, combined with dual-view contrastive learning to construct a robust global retrieval database; in the second stage, geospatial clustering re-ranks candidate locations, followed by a large multimodal model (LMM) to predict the final coordinates. This approach is the first to synergistically integrate semantic segmentation, dual-view contrastive learning, and LMMs for global-scale geolocalization. Evaluated on IM2GPS, IM2GPS3k, and YFCC4k benchmarks, it achieves absolute gains of 3.6%–16.58% at street-level (<1 km) and 1.29%–8.77% at city-level (<25 km) accuracy, significantly outperforming state-of-the-art methods.

📝 Abstract

Worldwide image geo-localization aims to infer the geographic location of an image captured anywhere on Earth, spanning street, city, regional, national, and continental scales. Existing methods rely on visual features that are sensitive to environmental variations (e.g., lighting, season, and weather) and lack effective post-processing to filter outlier candidates, limiting localization accuracy. To address these limitations, we propose DualGeo, a two-stage framework for worldwide image geo-localization. First, it establishes a geo-representational foundation by fusing image and semantic segmentation features via bidirectional cross-attention. The fused features are then aligned with GPS coordinates through dual-view contrastive learning to build a global retrieval database. Second, it performs geo-cognitive refinement by re-ranking retrieved candidates using geographic clustering. It then feeds them into large multimodal models (LMMs) for final coordinate prediction. Experiments on IM2GPS, IM2GPS3k, and YFCC4k show that DualGeo outperforms state-of-the-art methods, improving street-level (<1 km) and city-level (<25 km) localization accuracy by 3.6%-16.58% and 1.29%-8.77%, respectively. Our code and datasets are available : https://github.com/CJ310177/DualGeo.

Problem

Research questions and friction points this paper is trying to address.

image geo-localization

environmental variations

outlier filtering

localization accuracy

worldwide scale

Innovation

Methods, ideas, or system contributions that make the work stand out.

dual-view contrastive learning

semantic segmentation fusion

geographic clustering