Robust Scene Coordinate Regression via Geometrically-Consistent Global Descriptors

📅 2025-12-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing learning-based visual localization methods rely on geometric cues—such as co-visibility graphs—to construct global descriptors, resulting in weak discriminability in visually similar scenes and poor robustness to geometric noise. To address these limitations, this paper proposes a geometry–vision consistency-driven framework for global descriptor learning. We design the first descriptor aggregation module that jointly enforces geometric consistency and visual similarity. Our approach employs unsupervised batch sampling and contrastive learning solely based on image overlap scores, eliminating the need for manual scene annotations. Furthermore, we introduce an improved contrastive loss and an end-to-end trainable architecture. Evaluated on large-scale, challenging benchmarks, our method achieves significant improvements in localization accuracy while maintaining computational efficiency, memory efficiency, and strong generalization across diverse environments.

Technology Category

Application Category

📝 Abstract
Recent learning-based visual localization methods use global descriptors to disambiguate visually similar places, but existing approaches often derive these descriptors from geometric cues alone (e.g., covisibility graphs), limiting their discriminative power and reducing robustness in the presence of noisy geometric constraints. We propose an aggregator module that learns global descriptors consistent with both geometrical structure and visual similarity, ensuring that images are close in descriptor space only when they are visually similar and spatially connected. This corrects erroneous associations caused by unreliable overlap scores. Using a batch-mining strategy based solely on the overlap scores and a modified contrastive loss, our method trains without manual place labels and generalizes across diverse environments. Experiments on challenging benchmarks show substantial localization gains in large-scale environments while preserving computational and memory efficiency. Code is available at href{https://github.com/sontung/robust_scr}{github.com/sontung/robust_scr}.
Problem

Research questions and friction points this paper is trying to address.

Improves visual localization by integrating geometric and visual cues
Corrects erroneous associations from unreliable geometric constraints
Trains without manual labels for generalization across environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learns global descriptors combining geometric and visual similarity
Uses overlap-based batch mining without manual labels
Modifies contrastive loss for robust large-scale localization
🔎 Similar Papers
No similar papers found.