TAT-VPR: Ternary Adaptive Transformer for Dynamic and Efficient Visual Place Recognition

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

To address the trade-off between accuracy and efficiency in visual SLAM loop closure detection under dynamic scenes, this paper proposes a ternary adaptive Transformer architecture for dynamically adjustable visual place recognition. Our method introduces two key innovations: (1) a novel dynamic computation control mechanism that jointly integrates ternary-weight quantization and a learnable sparse activation gate, enabling runtime on-demand adjustment of computational load; and (2) a two-stage knowledge distillation pipeline that preserves descriptor discriminability under ultra-low-bit constraints. Experiments demonstrate that our approach achieves zero-loss Recall@1 while reducing peak computational cost by up to 40%, significantly outperforming existing lightweight methods. Furthermore, it has been successfully deployed on micro aerial vehicles and embedded SLAM systems, attaining state-of-the-art localization accuracy.

Technology Category

Application Category

📝 Abstract

TAT-VPR is a ternary-quantized transformer that brings dynamic accuracy-efficiency trade-offs to visual SLAM loop-closure. By fusing ternary weights with a learned activation-sparsity gate, the model can control computation by up to 40% at run-time without degrading performance (Recall@1). The proposed two-stage distillation pipeline preserves descriptor quality, letting it run on micro-UAV and embedded SLAM stacks while matching state-of-the-art localization accuracy.

Problem

Research questions and friction points this paper is trying to address.

Dynamic accuracy-efficiency trade-offs in visual SLAM loop-closure

Run-time computation control without performance degradation

Preserving descriptor quality for micro-UAV and embedded SLAM

Innovation

Methods, ideas, or system contributions that make the work stand out.

Ternary-quantized transformer for dynamic efficiency

Learned activation-sparsity gate controls computation

Two-stage distillation preserves descriptor quality

🔎 Similar Papers

Pair-VPR: Place-Aware Pre-training and Contrastive Pair Classification for Visual Place Recognition with Vision Transformers