Weather-Robust Cross-View Geo-Localization via Prototype-Based Semantic Part Discovery

πŸ“… 2026-05-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

199K/year
πŸ€– AI Summary
This work addresses the performance degradation in cross-view geo-localization caused by viewpoint discrepancies, altitude variations, and weather disturbances. To tackle these challenges, the authors propose SkyPart, a method that introduces a plug-in head atop a vision transformer to discover and group image patches into semantic parts via learnable prototype-based competitive assignment, thereby disentangling layout from texture. Height-conditional modulation is further incorporated to mitigate the influence of altitude information on feature embeddings. The model’s robustness and generalization are enhanced through a graph attention readout mechanism and a Kendall uncertainty-weighted multi-task loss. Experiments demonstrate that SkyPart achieves new state-of-the-art results on the SUES-200, University-1652, and DenseUAV benchmarks, significantly outperforming existing approaches under ten diverse weather perturbations in WeatherPrompt, while maintaining a compact model size of only 26.95 million parameters.
πŸ“ Abstract
Cross-view geo-localization (CVGL), which matches an oblique drone view to a geo-referenced satellite tile, has emerged as a key alternative for autonomous drone navigation when GNSS signals are jammed, spoofed, or unavailable. Despite strong recent progress, three limitations persist: (1) global-descriptor designs compress the patch grid into a single vector without separating layout from texture across the view gap; (2) altitude-related scale variation is retained in the learned embedding rather than marginalized; and (3) multi-objective training relies on hand-tuned scalars over losses on incompatible gradient scales. We propose SkyPart, a lightweight swappable head for patch-based vision transformers (ViTs) that institutes explicit part grouping over the patch grid. SkyPart has four theory-grounded components: (i) learnable prototypes competing for patch tokens via single-pass cosine assignment; (ii) altitude-conditioned linear modulation applied only during training, making the retrieval embedding altitude-free at inference; (iii) a graph-attention readout over active prototypes; and (iv) a Kendall uncertainty-weighted multi-objective loss whose stationary points are Pareto-stationary. At 26.95M parameters and 22.14 GFLOPs, SkyPart is the smallest among top-performing methods and sets a new state of the art on SUES-200, University-1652, and DenseUAV under a single-pass, no-re-ranking, no-TTA protocol. Its advantage over the strongest baseline widens under the ten-condition WeatherPrompt corruption benchmark.
Problem

Research questions and friction points this paper is trying to address.

cross-view geo-localization
weather robustness
scale variation
multi-objective training
semantic part discovery
Innovation

Methods, ideas, or system contributions that make the work stand out.

prototype-based part discovery
altitude-invariant embedding
graph-attention readout
uncertainty-weighted multi-objective loss
weather-robust geo-localization
πŸ”Ž Similar Papers
No similar papers found.
C
Chi-Nguyen Tran
Faculty of Information Technology, University of Science, Vietnam National University, Ho Chi Minh City, Vietnam
D
Dao Sy Duy Minh
Faculty of Information Technology, University of Science, Vietnam National University, Ho Chi Minh City, Vietnam
H
Huynh Trung Kiet
Faculty of Information Technology, University of Science, Vietnam National University, Ho Chi Minh City, Vietnam
N
Nguyen Lam Phu Quy
Faculty of Information Technology, University of Science, Vietnam National University, Ho Chi Minh City, Vietnam
P
Phu-Hoa Pham
Faculty of Information Technology, University of Science, Vietnam National University, Ho Chi Minh City, Vietnam
Long Tran-Thanh
Long Tran-Thanh
Professor in Computer Science, University of Warwick
Artificial IntelligenceAI for social goodgame theoryhuman-agent learningmulti-armed bandits