🤖 AI Summary
This work addresses the challenges of cross-view geo-localization for UAVs in GPS-denied environments, where domain shifts and visual distractions arise from texture discrepancies, weather variations, and densely distributed fine-grained objects. To tackle these issues, the paper introduces a novel framework that, for the first time, integrates the information bottleneck principle with object-centric learning, featuring a structural relation alignment and knowledge-constrained mechanism. Within an information-theoretic framework, the method optimizes cross-view structural relationships among objects while effectively suppressing view-specific noise. Extensive experiments demonstrate that the proposed approach significantly outperforms state-of-the-art methods across multiple benchmarks and complex real-world scenarios, achieving markedly enhanced robustness and generalization capability.
📝 Abstract
Cross-view geo-localization (CVGL) is fundamental for precise localization and navigation in GPS-denied environments, aiming to match ground or UAV imagery with satellite views. While existing approaches rely on global feature alignment, they often suffer from substantial domain shifts induced by varying regional textures and weather conditions. This issue becomes even more pronounced in UAV-based scenarios, where the broader perspective inevitably introduces dense, fine-grained objects, creating significant visual clutter. To address this, we draw inspiration from Object-Centric Learning (OCL) and propose InfoGeo, an information-theoretic framework designed to enhance robustness and generalization. InfoGeo reformulates the optimization as an information bottleneck process with two core objectives: (i) maximizing view-invariant information by aligning the object-centric structural relations across views, and (ii) minimizing view-specific noisy signals through cross-view knowledge constraints. Extensive evaluations across diverse benchmarks and challenging scenarios demonstrate that InfoGeo significantly outperforms state-of-the-art methods.