TAT-VPR: Ternary Adaptive Transformer for Dynamic and Efficient Visual Place Recognition

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF

career value

199K/year
🤖 AI Summary
To address the trade-off between accuracy and efficiency in visual SLAM loop closure detection under dynamic scenes, this paper proposes a ternary adaptive Transformer architecture for dynamically adjustable visual place recognition. Our method introduces two key innovations: (1) a novel dynamic computation control mechanism that jointly integrates ternary-weight quantization and a learnable sparse activation gate, enabling runtime on-demand adjustment of computational load; and (2) a two-stage knowledge distillation pipeline that preserves descriptor discriminability under ultra-low-bit constraints. Experiments demonstrate that our approach achieves zero-loss Recall@1 while reducing peak computational cost by up to 40%, significantly outperforming existing lightweight methods. Furthermore, it has been successfully deployed on micro aerial vehicles and embedded SLAM systems, attaining state-of-the-art localization accuracy.

Technology Category

Application Category

📝 Abstract
TAT-VPR is a ternary-quantized transformer that brings dynamic accuracy-efficiency trade-offs to visual SLAM loop-closure. By fusing ternary weights with a learned activation-sparsity gate, the model can control computation by up to 40% at run-time without degrading performance (Recall@1). The proposed two-stage distillation pipeline preserves descriptor quality, letting it run on micro-UAV and embedded SLAM stacks while matching state-of-the-art localization accuracy.
Problem

Research questions and friction points this paper is trying to address.

Dynamic accuracy-efficiency trade-offs in visual SLAM loop-closure
Run-time computation control without performance degradation
Preserving descriptor quality for micro-UAV and embedded SLAM
Innovation

Methods, ideas, or system contributions that make the work stand out.

Ternary-quantized transformer for dynamic efficiency
Learned activation-sparsity gate controls computation
Two-stage distillation preserves descriptor quality