🤖 AI Summary
This work investigates how semantic priors influence semantic localization performance and robustness within self-supervised contrastive learning frameworks, particularly focusing on noise suppression and selective attention to discriminative landmarks rather than generic clutter. We propose an implicit weighting mechanism based on semantic-class ablation, revealing the model’s intrinsic tendency to downweight high-frequency, low-discriminative objects. To validate interpretability under visual and structural variations, we integrate gradient-based attribution (e.g., integrated gradients), attention visualization, semantic scene graph modeling, and posterior introspection analysis. Experimental results demonstrate that the learned location representations exhibit both noise resilience and semantic saliency, enabling stable and interpretable cross-view matching across diverse challenging scenarios—including occlusion, viewpoint shifts, and environmental degradation.
📝 Abstract
This work investigates how semantics influence localisation performance and robustness in a learned self-supervised, contrastive semantic localisation framework. After training a localisation network on both original and perturbed maps, we conduct a thorough post-hoc introspection analysis to probe whether the model filters environmental noise and prioritises distinctive landmarks over routine clutter. We validate various interpretability methods and present a comparative reliability analysis. Integrated gradients and Attention Weights consistently emerge as the most reliable probes of learned behaviour. A semantic class ablation further reveals an implicit weighting in which frequent objects are often down-weighted. Overall, the results indicate that the model learns noise-robust, semantically salient relations about place definition, thereby enabling explainable registration under challenging visual and structural variations.