CLNet: Cross-View Correspondence Makes a Stronger Geo-Localizationer

📅 2025-12-16

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Cross-view geolocalization (IRCVGL) faces significant challenges due to large viewpoint disparities between satellite and street-level imagery, hindering explicit spatial correspondence modeling. To address this, we propose a correspondence-aware feature refinement framework that introduces a novel neural correspondence map to model pixel-level spatial mappings. This is integrated with a nonlinear embedding transformer for semantically consistent cross-view feature alignment and a global feature recalibration mechanism to enhance geometric consistency. Our method jointly leverages implicit correspondence field modeling, MLP-driven feature transformation, and spatially guided channel reweighting. Evaluated on four major benchmarks—CVUSA, CVACT, VIGOR, and University-1652—it achieves state-of-the-art performance, substantially improving localization accuracy, generalization capability, and interpretability.

Technology Category

Application Category

📝 Abstract

Image retrieval-based cross-view geo-localization (IRCVGL) aims to match images captured from significantly different viewpoints, such as satellite and street-level images. Existing methods predominantly rely on learning robust global representations or implicit feature alignment, which often fail to model explicit spatial correspondences crucial for accurate localization. In this work, we propose a novel correspondence-aware feature refinement framework, termed CLNet, that explicitly bridges the semantic and geometric gaps between different views. CLNet decomposes the view alignment process into three learnable and complementary modules: a Neural Correspondence Map (NCM) that spatially aligns cross-view features via latent correspondence fields; a Nonlinear Embedding Converter (NEC) that remaps features across perspectives using an MLP-based transformation; and a Global Feature Recalibration (GFR) module that reweights informative feature channels guided by learned spatial cues. The proposed CLNet can jointly capture both high-level semantics and fine-grained alignments. Extensive experiments on four public benchmarks, CVUSA, CVACT, VIGOR, and University-1652, demonstrate that our proposed CLNet achieves state-of-the-art performance while offering better interpretability and generalizability.

Problem

Research questions and friction points this paper is trying to address.

Explicitly models spatial correspondences for cross-view geo-localization

Bridges semantic and geometric gaps between satellite and street-level images

Enhances interpretability and generalizability in image retrieval-based localization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural Correspondence Map aligns cross-view spatial features

Nonlinear Embedding Converter remaps features across perspectives

Global Feature Recalibration reweights channels using spatial cues

🔎 Similar Papers

No similar papers found.