VFM-Loc: Zero-Shot Cross-View Geo-Localization via Aligning Discriminative Visual Hierarchies

📅 2026-03-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited generalization of cross-view geolocalization in remote sensing, which stems from large viewpoint discrepancies and dataset bias. The paper proposes the first zero-shot framework that achieves high-performance matching without any training. By leveraging features from vision foundation models, the method hierarchically extracts discriminative visual cues and progressively aligns the statistical manifolds between drone and satellite images through generalized mean pooling, scale-weighted RMAC, domain-level PCA, and orthogonal Procrustes analysis. Evaluated on standard benchmarks, the approach significantly outperforms existing methods, surpassing supervised counterparts by over 20% in Recall@1 on the LO-UCV dataset. This study establishes, for the first time, the feasibility and superiority of zero-shot cross-view geolocalization.

Technology Category

Application Category

📝 Abstract
Cross-View Geo-Localization (CVGL) in remote sensing aims to locate a drone-view query by matching it to geo-tagged satellite images. Although supervised methods have achieved strong results on closeset benchmarks, they often fail to generalize to unconstrained, real-world scenarios due to severe viewpoint differences and dataset bias. To overcome these limitations, we present VFM-Loc, a training-free framework for zero-shot CVGL that leverages the generalizable visual representations from vision foundational models (VFMs). VFM-Loc identifies and matches discriminative visual clues across different viewpoints through a progressive alignment strategy. First, we design a hierarchical clue extraction mechanism using Generalized Mean pooling and Scale-Weighted RMAC to preserve distinctive visual clues across scales while maintaining hierarchical confidence. Second, we introduce a statistical manifold alignment pipeline based on domain-wise PCA and Orthogonal Procrustes analysis, linearly aligning heterogeneous feature distributions in a shared metric space. Experiments demonstrate that VFM-Loc exhibits strong zero-shot accuracy on standard benchmarks and surpasses supervised methods by over 20% in Recall@1 on the challenging LO-UCV dataset with large oblique angles. This work highlights that principled alignment of pre-trained features can effectively bridge the cross-view gap, establishing a robust and training-free paradigm for real-world CVGL. The relevant code is made available at: https://github.com/DingLei14/VFM-Loc.
Problem

Research questions and friction points this paper is trying to address.

Cross-View Geo-Localization
Zero-Shot
Remote Sensing
Viewpoint Difference
Dataset Bias
Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-Shot Geo-Localization
Vision Foundation Models
Hierarchical Feature Alignment
Cross-View Matching
Statistical Manifold Alignment
🔎 Similar Papers
No similar papers found.
J
Jun Lu
Information Engineering University, Zhengzhou, China
Z
Zehao Sang
Information Engineering University, Zhengzhou, China
H
Haoqi Wei
Information Engineering University, Zhengzhou, China
X
Xiangyun Liu
Information Engineering University, Zhengzhou, China
K
Kun Zhu
Information Engineering University, Zhengzhou, China
Haitao Guo
Haitao Guo
Professor, University of Pittsburgh
hepatitis B virusinnate immunityantiviral
Z
Zhihui Gong
Information Engineering University, Zhengzhou, China
L
Lei Ding
Information Engineering University, Zhengzhou, China; Chinese Academy of Sciences, Beijing, China