StructVPR++: Distill Structural and Semantic Knowledge with Weighting Samples for Visual Place Recognition

📅 2025-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Visual Place Recognition (VPR) faces dual challenges in autonomous driving and robotics: insufficient semantic discriminability of global features and high computational overhead in re-ranking. This paper proposes an end-to-end, RGB-only global feature learning framework to bridge the accuracy–efficiency gap. First, we introduce a novel label-aware feature disentanglement mechanism that enables explicit semantic alignment at inference time—without requiring segmentation masks. Second, we design segmentation-guided knowledge distillation and sample-weighted loss to dynamically suppress noisy image pairs and strengthen reliable supervision signals. Evaluated on four standard benchmarks, our method achieves 5–23% improvements in Recall@1 over state-of-the-art global-feature-based approaches, matching the performance of two-stage methods while enabling real-time, single-frame inference.

Technology Category

Application Category

📝 Abstract
Visual place recognition is a challenging task for autonomous driving and robotics, which is usually considered as an image retrieval problem. A commonly used two-stage strategy involves global retrieval followed by re-ranking using patch-level descriptors. Most deep learning-based methods in an end-to-end manner cannot extract global features with sufficient semantic information from RGB images. In contrast, re-ranking can utilize more explicit structural and semantic information in one-to-one matching process, but it is time-consuming. To bridge the gap between global retrieval and re-ranking and achieve a good trade-off between accuracy and efficiency, we propose StructVPR++, a framework that embeds structural and semantic knowledge into RGB global representations via segmentation-guided distillation. Our key innovation lies in decoupling label-specific features from global descriptors, enabling explicit semantic alignment between image pairs without requiring segmentation during deployment. Furthermore, we introduce a sample-wise weighted distillation strategy that prioritizes reliable training pairs while suppressing noisy ones. Experiments on four benchmarks demonstrate that StructVPR++ surpasses state-of-the-art global methods by 5-23% in Recall@1 and even outperforms many two-stage approaches, achieving real-time efficiency with a single RGB input.
Problem

Research questions and friction points this paper is trying to address.

Bridges gap between global retrieval and re-ranking for visual place recognition
Embeds structural and semantic knowledge into RGB global representations
Improves accuracy and efficiency in autonomous driving and robotics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Segmentation-guided distillation for knowledge embedding
Decoupling label-specific features for semantic alignment
Sample-wise weighted distillation to prioritize reliable pairs
🔎 Similar Papers
No similar papers found.
Yanqing Shen
Yanqing Shen
Xi'an jiaotong University, ETH
visual place recognitionimage representation
Sanping Zhou
Sanping Zhou
Xi'an Jiaotong University
Computer VisionMachine Learning
Jingwen Fu
Jingwen Fu
Xi'an Jiaotong University
Computer Visionmachine learning
R
Ruotong Wang
National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, Shaanxi 710049, China
Shitao Chen
Shitao Chen
Xi'an Jiaotong University
Nanning Zheng
Nanning Zheng
Xi'an Jiaotong University