SuperPlace: The Renaissance of Classical Feature Aggregation for Visual Place Recognition in the Era of Foundation Models

📅 2025-06-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In the foundation model (FM) era, classical feature aggregation methods have been overlooked, and cross-dataset training remains inconsistent. To address these issues, this work revisits and enhances the GeM and NetVLAD paradigms. We propose a supervised label alignment framework for joint training across multiple Visual Place Recognition (VPR) datasets; introduce a dual-GeM architecture (G²M) to improve channel-wise feature calibration; and pioneer a two-stage fine-tuning strategy (FT²) for NetVLAD, along with a NetVLAD-Linear compression module. Experiments demonstrate that G²M achieves state-of-the-art performance at merely 1/10 the embedding dimension; NVL-FT² ranks first on the MSLS leaderboard; and our methods consistently outperform existing FM-driven approaches across multiple benchmarks. Our core contribution lies in revitalizing classical aggregation methods—establishing an FM-compatible, efficient, and robust VPR paradigm grounded in principled feature aggregation.

Technology Category

Application Category

📝 Abstract
Recent visual place recognition (VPR) approaches have leveraged foundation models (FM) and introduced novel aggregation techniques. However, these methods have failed to fully exploit key concepts of FM, such as the effective utilization of extensive training sets, and they have overlooked the potential of classical aggregation methods, such as GeM and NetVLAD. Building on these insights, we revive classical feature aggregation methods and develop more fundamental VPR models, collectively termed SuperPlace. First, we introduce a supervised label alignment method that enables training across various VPR datasets within a unified framework. Second, we propose G$^2$M, a compact feature aggregation method utilizing two GeMs, where one GeM learns the principal components of feature maps along the channel dimension and calibrates the output of the other. Third, we propose the secondary fine-tuning (FT$^2$) strategy for NetVLAD-Linear (NVL). NetVLAD first learns feature vectors in a high-dimensional space and then compresses them into a lower-dimensional space via a single linear layer. Extensive experiments highlight our contributions and demonstrate the superiority of SuperPlace. Specifically, G$^2$M achieves promising results with only one-tenth of the feature dimensions compared to recent methods. Moreover, NVL-FT$^2$ ranks first on the MSLS leaderboard.
Problem

Research questions and friction points this paper is trying to address.

Reviving classical feature aggregation for VPR
Enhancing FM utilization in VPR models
Improving feature dimension efficiency in VPR
Innovation

Methods, ideas, or system contributions that make the work stand out.

Supervised label alignment for unified VPR training
G$^2$M: Dual GeM for compact feature aggregation
NetVLAD-Linear with secondary fine-tuning (FT$^2$)
B
Bingxi Liu
Peng Cheng Laboratory, Shenzhen, China.
Pengju Zhang
Pengju Zhang
University of Bristol
AIBioinformaticsStatistical PhysicsFinancial Technology
L
Li He
Southern University of Science and Technology, Shenzhen, China.
H
Hao Chen
Cambridge University, Cambridge, UK.
S
Shiyi Guo
Institute of Automation, Chinese Academy of Sciences, Beijing, China.
Y
Yihong Wu
Institute of Automation, Chinese Academy of Sciences, Beijing, China.
Jinqiang Cui
Jinqiang Cui
PCL
LLM/VLM+Multi-robots system
H
Hong Zhang
Southern University of Science and Technology, Shenzhen, China.