CVGL: Causal Learning and Geometric Topology

📅 2026-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of cross-view geo-localization, where viewpoint discrepancies and confounding factors hinder accurate image matching. To this end, we propose the CLGT framework, which uniquely integrates causal learning with geometric topological information. Specifically, a causal feature extractor disentangles confounding variables to emphasize task-relevant semantics, while a geometric fusion module leveraging bird’s-eye-view road topology mitigates viewpoint inconsistency. Additionally, a data-adaptive pooling mechanism enhances the representation of critical regions. Extensive experiments on CVUSA, CVACT, and their robustness-oriented variants demonstrate that CLGT significantly improves localization accuracy, achieving state-of-the-art performance—particularly under real-world perturbations—thereby validating its effectiveness and robustness.

Technology Category

Application Category

📝 Abstract
Cross-view geo-localization (CVGL) aims to estimate the geographic location of a street image by matching it with a corresponding aerial image. This is critical for autonomous navigation and mapping in complex real-world scenarios. However, the task remains challenging due to significant viewpoint differences and the influence of confounding factors. To tackle these issues, we propose the Causal Learning and Geometric Topology (CLGT) framework, which integrates two key components: a Causal Feature Extractor (CFE) that mitigates the influence of confounding factors by leveraging causal intervention to encourage the model to focus on stable, task-relevant semantics; and a Geometric Topology Fusion (GT Fusion) module that injects Bird's Eye View (BEV) road topology into street features to alleviate cross-view inconsistencies caused by extreme perspective changes. Additionally, we introduce a Data-Adaptive Pooling (DA Pooling) module to enhance the representation of semantically rich regions. Extensive experiments on CVUSA, CVACT, and their robustness-enhanced variants (CVUSA-C-ALL and CVACT-C-ALL) demonstrate that CLGT achieves state-of-the-art performance, particularly under challenging real-world corruptions. Our codes are available at https://github.com/oyss-szu/CLGT.
Problem

Research questions and friction points this paper is trying to address.

cross-view geo-localization
viewpoint differences
confounding factors
geographic location estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal Learning
Geometric Topology
Cross-view Geo-localization
Bird's Eye View
Data-Adaptive Pooling
🔎 Similar Papers
No similar papers found.
S
Songsong Ouyang
College of Computer Science and Software Engineering, Shenzhen University
Yingying Zhu
Yingying Zhu
Shenzhen University
Computer VisionArtificial Intelligence