🤖 AI Summary
This work reframes visual place recognition (VPR) as an image-pair retrieval task to better support downstream applications such as scene registration, SLAM, and structure-from-motion. The study systematically evaluates prominent VPR methods—including NetVLAD, CosPlace, EigenPlaces, MixVPR, AnyLoc, SALAD, and MegaLoc—on three cross-domain 3D datasets: Tanks and Temples, ScanNet-GS, and KITTI. For the first time, VPR is explicitly modeled as a front-end for image-pair retrieval, revealing the domain dependency of existing approaches under challenges like perceptual aliasing and sequence incompleteness. Experimental results demonstrate that modern global descriptors can serve as plug-and-play, efficient retrieval modules, offering practical guidance for robust 3D mapping and registration pipelines.
📝 Abstract
Visual Place Recognition (VPR) is a core component in computer vision, typically formulated as an image retrieval task for localization, mapping, and navigation. In this work, we instead study VPR as an image pair retrieval front-end for registration pipelines, where the goal is to find top-matching image pairs between two disjoint image sets for downstream tasks such as scene registration, SLAM, and Structure-from-Motion. We comparatively evaluate state-of-the-art VPR families - NetVLAD-style baselines, classification-based global descriptors (CosPlace, EigenPlaces), feature-mixing (MixVPR), and foundation-model-driven methods (AnyLoc, SALAD, MegaLoc) - on three challenging datasets: object-centric outdoor scenes (Tanks and Temples), indoor RGB-D scans (ScanNet-GS), and autonomous-driving sequences (KITTI). We show that modern global descriptor approaches are increasingly suitable as off-the-shelf image pair retrieval modules in challenging scenarios including perceptual aliasing and incomplete sequences, while exhibiting clear, domain-dependent strengths and weaknesses that are critical when choosing VPR components for robust mapping and registration.