DualGeo: A Dual-View Framework for Worldwide Image Geo-localization

📅 2026-04-28
📈 Citations: 0
Influential: 0
📄 PDF

career value

187K/year
🤖 AI Summary
This work addresses the challenges of visual feature instability caused by environmental changes and insufficient post-processing of candidate locations in global image geolocalization. The authors propose a two-stage framework: in the first stage, bidirectional cross-attention fuses image and semantic segmentation features, combined with dual-view contrastive learning to construct a robust global retrieval database; in the second stage, geospatial clustering re-ranks candidate locations, followed by a large multimodal model (LMM) to predict the final coordinates. This approach is the first to synergistically integrate semantic segmentation, dual-view contrastive learning, and LMMs for global-scale geolocalization. Evaluated on IM2GPS, IM2GPS3k, and YFCC4k benchmarks, it achieves absolute gains of 3.6%–16.58% at street-level (<1 km) and 1.29%–8.77% at city-level (<25 km) accuracy, significantly outperforming state-of-the-art methods.
📝 Abstract
Worldwide image geo-localization aims to infer the geographic location of an image captured anywhere on Earth, spanning street, city, regional, national, and continental scales. Existing methods rely on visual features that are sensitive to environmental variations (e.g., lighting, season, and weather) and lack effective post-processing to filter outlier candidates, limiting localization accuracy. To address these limitations, we propose DualGeo, a two-stage framework for worldwide image geo-localization. First, it establishes a geo-representational foundation by fusing image and semantic segmentation features via bidirectional cross-attention. The fused features are then aligned with GPS coordinates through dual-view contrastive learning to build a global retrieval database. Second, it performs geo-cognitive refinement by re-ranking retrieved candidates using geographic clustering. It then feeds them into large multimodal models (LMMs) for final coordinate prediction. Experiments on IM2GPS, IM2GPS3k, and YFCC4k show that DualGeo outperforms state-of-the-art methods, improving street-level (<1 km) and city-level (<25 km) localization accuracy by 3.6%-16.58% and 1.29%-8.77%, respectively. Our code and datasets are available : https://github.com/CJ310177/DualGeo.
Problem

Research questions and friction points this paper is trying to address.

image geo-localization
environmental variations
outlier filtering
localization accuracy
worldwide scale
Innovation

Methods, ideas, or system contributions that make the work stand out.

dual-view contrastive learning
semantic segmentation fusion
geographic clustering
large multimodal models
image geo-localization
🔎 Similar Papers
No similar papers found.
J
Junchao Cui
Henan Key Laboratory of Cyberspace Situation Awareness, Zhengzhou, China; Information Engineering University, Zhengzhou, China
Wenqi Shi
Wenqi Shi
Assistant Professor, University of Texas Southwestern Medical Center
AI for HealthcareLLM AgentClinical Decision SupportClinical Informatics
S
Shaoyong Du
Henan Key Laboratory of Cyberspace Situation Awareness, Zhengzhou, China; Information Engineering University, Zhengzhou, China
Hang He
Hang He
East China Normal University
AI AgentReinforcement LearningVLMIRLLM4SE
X
Xuanzi Ma
Information Engineering University, Zhengzhou, China
H
Hao Tang
Henan Key Laboratory of Cyberspace Situation Awareness, Zhengzhou, China; Information Engineering University, Zhengzhou, China
Xiangyang Luo
Xiangyang Luo
Zhengzhou Information Science and Technology Institute
information hidingdata hiding steganography