🤖 AI Summary
To address the sharp performance degradation, poor cross-environment generalization, and catastrophic forgetting in continual learning observed when deploying visual place recognition (VPR) systems in novel environments, this paper proposes a human memory-inspired lifelong VPR framework. Methodologically, we introduce an adaptive hard-negative sampling strategy to jointly optimize intra-environment accuracy and cross-environment robustness; design a three-tier memory bank—comprising sensory, working, and long-term memory modules—to enable multi-granularity environmental representation storage and retrieval; and incorporate probabilistic knowledge distillation to mitigate forgetting of previously learned places during model updates. Extensive experiments on Oxford RobotCar, Nordland, and TartanAir demonstrate that our approach achieves an average 13.65% improvement in recall rate over fine-tuning baselines and state-of-the-art continual learning methods, effectively enabling long-term environmental adaptation and incremental deployment.
📝 Abstract
Visual place recognition (VPR) is an essential component of many autonomous and augmented/virtual reality systems. It enables the systems to robustly localize themselves in large-scale environments. Existing VPR methods demonstrate attractive performance at the cost of heavy pre-training and limited generalizability. When deployed in unseen environments, these methods exhibit significant performance drops. Targeting this issue, we present VIPeR, a novel approach for visual incremental place recognition with the ability to adapt to new environments while retaining the performance of previous environments. We first introduce an adaptive mining strategy that balances the performance within a single environment and the generalizability across multiple environments. Then, to prevent catastrophic forgetting in lifelong learning, we draw inspiration from human memory systems and design a novel memory bank for our VIPeR. Our memory bank contains a sensory memory, a working memory and a long-term memory, with the first two focusing on the current environment and the last one for all previously visited environments. Additionally, we propose a probabilistic knowledge distillation to explicitly safeguard the previously learned knowledge. We evaluate our proposed VIPeR on three large-scale datasets, namely Oxford Robotcar, Nordland, and TartanAir. For comparison, we first set a baseline performance with naive finetuning. Then, several more recent lifelong learning methods are compared. Our VIPeR achieves better performance in almost all aspects with the biggest improvement of 13.65% in average performance.