🤖 AI Summary
This work addresses the challenge of unreliable confidence assessment in sequential visual place recognition (VPR) under cross-condition deployment, where data non-exchangeability undermines conventional conformal prediction. The authors propose SAFEVPR, a training-free validation and calibration framework that replaces cosine similarity with block-level mutual nearest neighbor (MNN) matching scores derived from DINOv2 ViT features. By integrating Mondrian conformal Learn-Then-Test calibration—applying score-bin-specific thresholds—and Bonferroni correction, SAFEVPR guarantees empirical validity without relying on exchangeability assumptions. Evaluated across 23 cross-condition scenarios, the method consistently meets the target false discovery rate (FDR) of α=0.10, achieving an average actual FDR of only 0.014 and an average true positive rate of 0.75, substantially outperforming baselines with comparable AUROC that fail to calibrate properly.
📝 Abstract
Sequence-based visual place recognition (VPR) for SLAM and robot relocalization must decide whether the retrieved top-1 candidate is safe to accept. Conformal prediction is a natural framework for this accept/reject decision, but its finite-sample guarantees rely on exchangeability between calibration and deployment (test) data, which is violated under cross-condition deployment. We introduce SAFEVPR, a non-trainable verification-and-calibration pipeline for safe cross-condition sequence VPR. SAFEVPR replaces the standard backbone cosine similarity with a mutual-nearest-neighbour (MNN) patch-matching score computed from frozen DINOv2 ViT features, and replaces flat Learn-Then-Test calibration with Mondrian conformal LTT, fitting separate Bonferroni-corrected thresholds across score bins. Under exchangeability, these thresholds would provide finite-sample false-discovery-rate (FDR) control; under condition shift, we evaluate empirical validity per deployment. Across 23 cross-condition setups from Oxford RobotCar, NCLT, and St Lucia datasets, using three frozen VPR backbones, SAFEVPR is empirically valid on 23/23 setups at target FDR alpha = 0.10, achieving mean accepted FDR 0.014 and mean true-positive rate (TPR) 0.75. The results show that raw discrimination alone is not sufficient for conformal validity: AnyLoc-VLAD and Super-Point+LightGlue reach comparable area under the receiver operating characteristic curve (AUROC) but fail more setups under the same calibration. On textureless repetitive scenery, SAFEVPR safely abstains rather than accepting unreliable matches. Code is available at https://github.com/Hasar12139/SafeVPR.