Joint Multi-Condition Representation Modelling via Matrix Factorisation for Visual Place Recognition

📅 2025-10-20

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

To address the high computational cost and insufficient robustness of existing descriptor-level fusion methods in multi-reference visual place recognition (VPR)—particularly under cross-appearance/viewpoint variations and multi-sensor scenarios—this paper proposes a training-free, descriptor-agnostic matrix decomposition framework. Our method jointly models multi-condition reference descriptors to learn a shared basis representation and condition-specific residual components, enabling efficient residual projection matching. This work is the first to introduce matrix decomposition into multi-reference VPR, supporting arbitrary pre-trained descriptors while ensuring lightweight inference and strong generalization. Evaluated on the structured multi-view SotonMV benchmark and unstructured datasets, our approach achieves ~18% higher Recall@1 over single-reference baselines and ~5% improvement over state-of-the-art multi-reference methods, significantly enhancing localization robustness and practicality under complex appearance and viewpoint changes.

Technology Category

Application Category

📝 Abstract

We address multi-reference visual place recognition (VPR), where reference sets captured under varying conditions are used to improve localisation performance. While deep learning with large-scale training improves robustness, increasing data diversity and model complexity incur extensive computational cost during training and deployment. Descriptor-level fusion via voting or aggregation avoids training, but often targets multi-sensor setups or relies on heuristics with limited gains under appearance and viewpoint change. We propose a training-free, descriptor-agnostic approach that jointly models places using multiple reference descriptors via matrix decomposition into basis representations, enabling projection-based residual matching. We also introduce SotonMV, a structured benchmark for multi-viewpoint VPR. On multi-appearance data, our method improves Recall@1 by up to ~18% over single-reference and outperforms multi-reference baselines across appearance and viewpoint changes, with gains of ~5% on unstructured data, demonstrating strong generalisation while remaining lightweight.

Problem

Research questions and friction points this paper is trying to address.

Addresses multi-reference visual place recognition under varying conditions

Reduces computational costs of training and deployment for VPR

Improves localization accuracy across appearance and viewpoint changes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Matrix decomposition creates basis representations for places

Projection-based residual matching enables robust localization

Training-free descriptor-agnostic approach generalizes across conditions

🔎 Similar Papers

No similar papers found.