Focus on Local: Finding Reliable Discriminative Regions for Visual Place Recognition

📅 2025-04-11
🏛️ AAAI Conference on Artificial Intelligence
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address insufficient discriminative local region modeling, background-induced false matches, and the lack of local supervision—leading to compromised efficiency and accuracy in visual place recognition (VPR)—this paper proposes a weakly supervised local feature learning framework. Our method introduces: (1) a novel spatial alignment loss (SAL) and foreground-background contrastive enhancement loss (CEL) to enable precise localization and discriminability enhancement of local regions; (2) a weakly supervised local training paradigm that generates pseudo-correspondences from global features, eliminating the need for manual annotations; and (3) an efficient, discriminative-region-guided re-ranking pipeline. Evaluated on mainstream VPR benchmarks, our approach achieves state-of-the-art performance in both image retrieval and re-ranking. Compared to two-stage methods, it improves re-ranking speed by 3.2× and recall by 4.7%.

Technology Category

Application Category

📝 Abstract
Visual Place Recognition (VPR) is aimed at predicting the location of a query image by referencing a database of geotagged images. For VPR task, often fewer discriminative local regions in an image produce important effects while mundane background regions do not contribute or even cause perceptual aliasing because of easy overlap. However, existing methods lack precisely modeling and full exploitation of these discriminative regions. In addition, the lack of pixel-level correspondence supervision in the VPR dataset hinders further improvement of the local feature matching capability in the re-ranking stage. In this paper, we propose the Focus on Local (FoL) approach to stimulate the performance of image retrieval and re-ranking in VPR simultaneously by mining and exploiting reliable discriminative local regions in images and introducing pseudo-correlation supervision. First, we design two losses, Extraction-Aggregation Spatial Alignment Loss (SAL) and Foreground-Background Contrast Enhancement Loss (CEL), to explicitly model reliable discriminative local regions and use them to guide the generation of global representations and efficient re-ranking. Second, we introduce a weakly-supervised local feature training strategy based on pseudo-correspondences obtained from aggregating global features to alleviate the lack of local correspondences ground truth for the VPR task. Third, we suggest an efficient re-ranking pipeline that is efficiently and precisely based on discriminative region guidance. Finally, experimental results show that our FoL achieves the state-of-the-art on multiple VPR benchmarks in both image retrieval and re-ranking stages and also significantly outperforms existing two-stage VPR methods in terms of computational efficiency.
Problem

Research questions and friction points this paper is trying to address.

Identifying reliable discriminative regions for Visual Place Recognition
Lack of precise modeling in existing VPR methods
Improving image retrieval and re-ranking efficiency in VPR
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mining reliable discriminative local regions
Weakly-supervised local feature training strategy
Efficient re-ranking pipeline with region guidance
🔎 Similar Papers
No similar papers found.
Changwei Wang
Changwei Wang
Shandong Computer Science Center
Multimodal LearningEmbodied AIEdge Intelligent ComputingAI for HealthcareSafety Alignment
S
Shunpeng Chen
School of Artificial Intelligence, Beijing University of Posts and Telecommunications
Y
Yukun Song
School of Artificial Intelligence, Beijing University of Posts and Telecommunications
Rongtao Xu
Rongtao Xu
MBZUAI << CASIA << HUST
Intelligent RobotEmbodied AIVLAVLMSpatialtemporal AI
Z
Zherui Zhang
School of Artificial Intelligence, Beijing University of Posts and Telecommunications
J
Jiguang Zhang
MAIS, Institute of Automation, Chinese Academy of Sciences
Haoran Yang
Haoran Yang
Central South University
Graph Neural NetworksData MiningRecommendation Systems
Y
Yu Zhang
Tongji University
Kexue Fu
Kexue Fu
City University of Hong Kong
HCIStorytellingCreativityCognitionHuman-AI collaboration
Shide Du
Shide Du
University of Zurich
Trustworthy learninginterpretable deep learningopen-set learningmulti-view learning
Z
Zhiwei Xu
Shandong University
Longxiang Gao
Longxiang Gao
Professor, Qilu University of Technology; Adjunct Professor, University of Southern Queensland
Edge AIFederated LearningMachine LearningQuantum Computing
L
Li Guo
School of Artificial Intelligence, Beijing University of Posts and Telecommunications
Shibiao Xu
Shibiao Xu
Beijing University of Posts and Telecommunications
Computer VisionMachine LearningComputer Graphics