JRN-Geo: A Joint Perception Network based on RGB and Normal images for Cross-view Geo-localization

📅 2025-09-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Cross-view geo-localization faces significant challenges due to large viewpoint discrepancies and substantial appearance variations between aerial and ground-level imagery. Existing approaches predominantly rely on RGB semantic features while neglecting geometric structural cues. To address this, we propose a dual-branch joint-perception network that simultaneously encodes RGB semantic features and surface normal map–based geometric structural features. We introduce a Difference-Aware Fusion Module (DAFM) and a Joint Constraint Interaction Aggregation (JCIA) strategy to enable deep semantic–structural feature collaboration. Furthermore, we incorporate a multi-view augmentation scheme grounded in 3D geospatial data to enhance viewpoint-invariant representation learning. Extensive experiments demonstrate that our method achieves state-of-the-art performance on both the University-1652 and SUES-200 benchmarks, significantly outperforming existing approaches in accuracy and robustness.

Technology Category

Application Category

📝 Abstract
Cross-view geo-localization plays a critical role in Unmanned Aerial Vehicle (UAV) localization and navigation. However, significant challenges arise from the drastic viewpoint differences and appearance variations between images. Existing methods predominantly rely on semantic features from RGB images, often neglecting the importance of spatial structural information in capturing viewpoint-invariant features. To address this issue, we incorporate geometric structural information from normal images and introduce a Joint perception network to integrate RGB and Normal images (JRN-Geo). Our approach utilizes a dual-branch feature extraction framework, leveraging a Difference-Aware Fusion Module (DAFM) and Joint-Constrained Interaction Aggregation (JCIA) strategy to enable deep fusion and joint-constrained semantic and structural information representation. Furthermore, we propose a 3D geographic augmentation technique to generate potential viewpoint variation samples, enhancing the network's ability to learn viewpoint-invariant features. Extensive experiments on the University-1652 and SUES-200 datasets validate the robustness of our method against complex viewpoint ariations, achieving state-of-the-art performance.
Problem

Research questions and friction points this paper is trying to address.

Addressing viewpoint differences in cross-view geo-localization for UAVs
Integrating geometric structural information with RGB image features
Learning viewpoint-invariant features through joint semantic-structural representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint RGB-Normal image perception network
Dual-branch fusion with DAFM and JCIA
3D geographic augmentation for viewpoint invariance
🔎 Similar Papers
No similar papers found.
H
Hongyu Zhou
College of Information Science and Engineering, Northeastern University, Shenyang 110819, China
Yunzhou Zhang
Yunzhou Zhang
College of Information Science and Engineering, Northeastern University, Shenyang 110819, China
T
Tingsong Huang
School of Computer Science, University of Sheffield, Sheffield S1 4DP, United Kingdom
F
Fawei Ge
College of Information Science and Engineering, Northeastern University, Shenyang 110819, China
M
Man Qi
College of Information Science and Engineering, Northeastern University, Shenyang 110819, China
Xichen Zhang
Xichen Zhang
The Hong Kong University of Science and Technology
Y
Yizhong Zhang
College of Information Science and Engineering, Northeastern University, Shenyang 110819, China