CLNet: Cross-View Correspondence Makes a Stronger Geo-Localizationer

📅 2025-12-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Cross-view geolocalization (IRCVGL) faces significant challenges due to large viewpoint disparities between satellite and street-level imagery, hindering explicit spatial correspondence modeling. To address this, we propose a correspondence-aware feature refinement framework that introduces a novel neural correspondence map to model pixel-level spatial mappings. This is integrated with a nonlinear embedding transformer for semantically consistent cross-view feature alignment and a global feature recalibration mechanism to enhance geometric consistency. Our method jointly leverages implicit correspondence field modeling, MLP-driven feature transformation, and spatially guided channel reweighting. Evaluated on four major benchmarks—CVUSA, CVACT, VIGOR, and University-1652—it achieves state-of-the-art performance, substantially improving localization accuracy, generalization capability, and interpretability.

Technology Category

Application Category

📝 Abstract
Image retrieval-based cross-view geo-localization (IRCVGL) aims to match images captured from significantly different viewpoints, such as satellite and street-level images. Existing methods predominantly rely on learning robust global representations or implicit feature alignment, which often fail to model explicit spatial correspondences crucial for accurate localization. In this work, we propose a novel correspondence-aware feature refinement framework, termed CLNet, that explicitly bridges the semantic and geometric gaps between different views. CLNet decomposes the view alignment process into three learnable and complementary modules: a Neural Correspondence Map (NCM) that spatially aligns cross-view features via latent correspondence fields; a Nonlinear Embedding Converter (NEC) that remaps features across perspectives using an MLP-based transformation; and a Global Feature Recalibration (GFR) module that reweights informative feature channels guided by learned spatial cues. The proposed CLNet can jointly capture both high-level semantics and fine-grained alignments. Extensive experiments on four public benchmarks, CVUSA, CVACT, VIGOR, and University-1652, demonstrate that our proposed CLNet achieves state-of-the-art performance while offering better interpretability and generalizability.
Problem

Research questions and friction points this paper is trying to address.

Explicitly models spatial correspondences for cross-view geo-localization
Bridges semantic and geometric gaps between satellite and street-level images
Enhances interpretability and generalizability in image retrieval-based localization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural Correspondence Map aligns cross-view spatial features
Nonlinear Embedding Converter remaps features across perspectives
Global Feature Recalibration reweights channels using spatial cues
🔎 Similar Papers
No similar papers found.
X
Xianwei Cao
School of Artificial Intelligence, Xidian University
Dou Quan
Dou Quan
Xidian University
computer visiondeep learning
S
Shuang Wang
School of Artificial Intelligence, Xidian University
N
Ning Huyan
School of Artificial Intelligence, Xidian University
W
Wei Wang
School of Artificial Intelligence, Xidian University
Y
Yunan Li
School of Artificial Intelligence, Xidian University
Licheng Jiao
Licheng Jiao
Distinguished Professor of Xidian University, IEEE Fellow
Neural NetworksComputational IntelligenceEvolutionary ComputationRemote SensingPattern Recognition.