Feature Complementation Architecture for Visual Place Recognition

πŸ“… 2025-06-14
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address insufficient feature robustness under environmental variations in Visual Place Recognition (VPR), this paper proposes the Local-Global Complementary Network (LGCN). Methodologically, LGCN employs a parallel CNN-ViT hybrid architecture to jointly model local details and global contextual semantics; introduces a Dynamic Feature Fusion Module (DFM) that jointly captures spatial and channel-wise dependencies; and incorporates a lightweight frequency-spatial adapter to enable efficient task-specific adaptation while keeping the ViT backbone frozen. Experiments demonstrate that LGCN achieves substantial improvements in cross-condition localization accuracy and robustness across multiple VPR benchmarks, consistently outperforming state-of-the-art methods. Key contributions include: (i) the first DFM mechanism for adaptive multi-scale feature integration; (ii) a novel frequency-spatial collaborative adaptation paradigm; and (iii) an efficient frozen-backbone fine-tuning strategy that balances computational efficiency and performance.

Technology Category

Application Category

πŸ“ Abstract
Visual place recognition (VPR) plays a crucial role in robotic localization and navigation. The key challenge lies in constructing feature representations that are robust to environmental changes. Existing methods typically adopt convolutional neural networks (CNNs) or vision Transformers (ViTs) as feature extractors. However, these architectures excel in different aspects -- CNNs are effective at capturing local details. At the same time, ViTs are better suited for modeling global context, making it difficult to leverage the strengths of both. To address this issue, we propose a local-global feature complementation network (LGCN) for VPR which integrates a parallel CNN-ViT hybrid architecture with a dynamic feature fusion module (DFM). The DFM performs dynamic feature fusion through joint modeling of spatial and channel-wise dependencies. Furthermore, to enhance the expressiveness and adaptability of the ViT branch for VPR tasks, we introduce lightweight frequency-to-spatial fusion adapters into the frozen ViT backbone. These adapters enable task-specific adaptation with controlled parameter overhead. Extensive experiments on multiple VPR benchmark datasets demonstrate that the proposed LGCN consistently outperforms existing approaches in terms of localization accuracy and robustness, validating its effectiveness and generalizability.
Problem

Research questions and friction points this paper is trying to address.

Robust feature representation for visual place recognition
Integrating CNN and ViT strengths for better VPR
Dynamic feature fusion for improved localization accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallel CNN-ViT hybrid architecture for VPR
Dynamic feature fusion module for dependencies
Lightweight frequency-to-spatial fusion adapters
πŸ”Ž Similar Papers
No similar papers found.
Weiwei Wang
Weiwei Wang
Anhui University
micromagnetic simulation
M
Meijia Wang
School of Electronic Information and Artificial Intelligence, Shaanxi University of Science and Technology, Xi’an, 710021, Shaanxi Province, China.
Haoyi Wang
Haoyi Wang
Lecturer in Computer Science, University of Plymouth
Biometrics
W
Wenqiang Guo
School of Electronic Information and Artificial Intelligence, Shaanxi University of Science and Technology, Xi’an, 710021, Shaanxi Province, China.
Jiapan Guo
Jiapan Guo
postdoc in medical image informatics
medical image analysis
Changming Sun
Changming Sun
CSIRO Data61
Computer VisionImage ProcessingPattern RecognitionDeep Learning
L
Lingkun Ma
School of Electronic Information and Artificial Intelligence, Shaanxi University of Science and Technology, Xi’an, 710021, Shaanxi Province, China.
Weichuan Zhang
Weichuan Zhang
Full Professor, Shaanxi University of Science & Technology
Image ProcessingImage AnalysisPattern RecognitionComputer Vision