TAT-VPR: Ternary Adaptive Transformer for Dynamic and Efficient Visual Place Recognition

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the trade-off between accuracy and efficiency in visual SLAM loop closure detection under dynamic scenes, this paper proposes a ternary adaptive Transformer architecture for dynamically adjustable visual place recognition. Our method introduces two key innovations: (1) a novel dynamic computation control mechanism that jointly integrates ternary-weight quantization and a learnable sparse activation gate, enabling runtime on-demand adjustment of computational load; and (2) a two-stage knowledge distillation pipeline that preserves descriptor discriminability under ultra-low-bit constraints. Experiments demonstrate that our approach achieves zero-loss Recall@1 while reducing peak computational cost by up to 40%, significantly outperforming existing lightweight methods. Furthermore, it has been successfully deployed on micro aerial vehicles and embedded SLAM systems, attaining state-of-the-art localization accuracy.

Technology Category

Application Category

📝 Abstract
TAT-VPR is a ternary-quantized transformer that brings dynamic accuracy-efficiency trade-offs to visual SLAM loop-closure. By fusing ternary weights with a learned activation-sparsity gate, the model can control computation by up to 40% at run-time without degrading performance (Recall@1). The proposed two-stage distillation pipeline preserves descriptor quality, letting it run on micro-UAV and embedded SLAM stacks while matching state-of-the-art localization accuracy.
Problem

Research questions and friction points this paper is trying to address.

Dynamic accuracy-efficiency trade-offs in visual SLAM loop-closure
Run-time computation control without performance degradation
Preserving descriptor quality for micro-UAV and embedded SLAM
Innovation

Methods, ideas, or system contributions that make the work stand out.

Ternary-quantized transformer for dynamic efficiency
Learned activation-sparsity gate controls computation
Two-stage distillation preserves descriptor quality
O
Oliver Grainge
School of Electronics and Computer Science, University of Southampton, United Kingdon
Michael Milford
Michael Milford
QUT Professor | Director, QUT Robotics Centre | ARC Laureate Fellow | Microsoft Fellow
Roboticscomputational neurosciencenavigationSLAMRatSLAM
I
Indu Bodala
School of Electronics and Computer Science, University of Southampton, United Kingdon
S
Sarvapali D. Ramchurn
School of Electronics and Computer Science, University of Southampton, United Kingdon
Shoaib Ehsan
Shoaib Ehsan
Assoc. Prof, University of Southampton | Reader, University of Essex | Co-I, Responsible AI UK
Computer VisionRoboticsEmbedded SystemsResponsible AIVisual Place Recognition