Enhancing Cross-View UAV Geolocalization via LVLM-Driven Relational Modeling

📅 2026-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing cross-view drone geolocation methods struggle to effectively model the deep visual-semantic correspondences between drone and satellite imagery, which limits matching accuracy. This work proposes a plug-and-play ranking architecture that, for the first time, integrates large vision-language models (LVLMs) into this task to explicitly capture relational structures across views. To enhance discriminative power and training stability, we introduce a soft-label-driven, relation-aware loss function. Experimental results demonstrate that the proposed method significantly outperforms current baselines across multiple benchmark datasets and maintains superior retrieval performance even under challenging conditions.

Technology Category

Application Category

📝 Abstract
The primary objective of cross-view UAV geolocalization is to identify the exact spatial coordinates of drone-captured imagery by aligning it with extensive, geo-referenced satellite databases. Current approaches typically extract features independently from each perspective and rely on basic heuristics to compute similarity, thereby failing to explicitly capture the essential interactions between different views. To address this limitation, we introduce a novel, plug-and-play ranking architecture designed to explicitly perform joint relational modeling for improved UAV-to-satellite image matching. By harnessing the capabilities of a Large Vision-Language Model (LVLM), our framework effectively learns the deep visual-semantic correlations linking UAV and satellite imagery. Furthermore, we present a novel relational-aware loss function to optimize the training phase. By employing soft labels, this loss provides fine-grained supervision that avoids overly penalizing near-positive matches, ultimately boosting both the model's discriminative power and training stability. Comprehensive evaluations across various baseline architectures and standard benchmarks reveal that the proposed method substantially boosts the retrieval accuracy of existing models, yielding superior performance even under highly demanding conditions.
Problem

Research questions and friction points this paper is trying to address.

cross-view geolocalization
UAV
satellite imagery
image matching
relational modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-view geolocalization
Large Vision-Language Model (LVLM)
relational modeling
soft-label loss
UAV-to-satellite matching
🔎 Similar Papers
No similar papers found.