Joint Multi-Condition Representation Modelling via Matrix Factorisation for Visual Place Recognition

📅 2025-10-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational cost and insufficient robustness of existing descriptor-level fusion methods in multi-reference visual place recognition (VPR)—particularly under cross-appearance/viewpoint variations and multi-sensor scenarios—this paper proposes a training-free, descriptor-agnostic matrix decomposition framework. Our method jointly models multi-condition reference descriptors to learn a shared basis representation and condition-specific residual components, enabling efficient residual projection matching. This work is the first to introduce matrix decomposition into multi-reference VPR, supporting arbitrary pre-trained descriptors while ensuring lightweight inference and strong generalization. Evaluated on the structured multi-view SotonMV benchmark and unstructured datasets, our approach achieves ~18% higher Recall@1 over single-reference baselines and ~5% improvement over state-of-the-art multi-reference methods, significantly enhancing localization robustness and practicality under complex appearance and viewpoint changes.

Technology Category

Application Category

📝 Abstract
We address multi-reference visual place recognition (VPR), where reference sets captured under varying conditions are used to improve localisation performance. While deep learning with large-scale training improves robustness, increasing data diversity and model complexity incur extensive computational cost during training and deployment. Descriptor-level fusion via voting or aggregation avoids training, but often targets multi-sensor setups or relies on heuristics with limited gains under appearance and viewpoint change. We propose a training-free, descriptor-agnostic approach that jointly models places using multiple reference descriptors via matrix decomposition into basis representations, enabling projection-based residual matching. We also introduce SotonMV, a structured benchmark for multi-viewpoint VPR. On multi-appearance data, our method improves Recall@1 by up to ~18% over single-reference and outperforms multi-reference baselines across appearance and viewpoint changes, with gains of ~5% on unstructured data, demonstrating strong generalisation while remaining lightweight.
Problem

Research questions and friction points this paper is trying to address.

Addresses multi-reference visual place recognition under varying conditions
Reduces computational costs of training and deployment for VPR
Improves localization accuracy across appearance and viewpoint changes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Matrix decomposition creates basis representations for places
Projection-based residual matching enables robust localization
Training-free descriptor-agnostic approach generalizes across conditions
🔎 Similar Papers
No similar papers found.
T
Timur Ismagilov
School of Electronics and Computer Science, University of Southampton, SO17 1BJ Southampton, U.K.
S
Shakaiba Majeed
Department of Computer Science and Engineering, Hanyang University, Seoul, South Korea
Michael Milford
Michael Milford
QUT Professor | Director, QUT Robotics Centre | ARC Laureate Fellow | Microsoft Fellow
Roboticscomputational neurosciencenavigationSLAMRatSLAM
T
Tan Viet Tuyen Nguyen
School of Electronics and Computer Science, University of Southampton, SO17 1BJ Southampton, U.K.
S
Sarvapali D. Ramchurn
School of Electronics and Computer Science, University of Southampton, SO17 1BJ Southampton, U.K.
Shoaib Ehsan
Shoaib Ehsan
Assoc. Prof, University of Southampton | Reader, University of Essex | Co-I, Responsible AI UK
Computer VisionRoboticsEmbedded SystemsResponsible AIVisual Place Recognition