🤖 AI Summary
To address the trade-off between accuracy and efficiency in visual SLAM loop closure detection under dynamic scenes, this paper proposes a ternary adaptive Transformer architecture for dynamically adjustable visual place recognition. Our method introduces two key innovations: (1) a novel dynamic computation control mechanism that jointly integrates ternary-weight quantization and a learnable sparse activation gate, enabling runtime on-demand adjustment of computational load; and (2) a two-stage knowledge distillation pipeline that preserves descriptor discriminability under ultra-low-bit constraints. Experiments demonstrate that our approach achieves zero-loss Recall@1 while reducing peak computational cost by up to 40%, significantly outperforming existing lightweight methods. Furthermore, it has been successfully deployed on micro aerial vehicles and embedded SLAM systems, attaining state-of-the-art localization accuracy.
📝 Abstract
TAT-VPR is a ternary-quantized transformer that brings dynamic accuracy-efficiency trade-offs to visual SLAM loop-closure. By fusing ternary weights with a learned activation-sparsity gate, the model can control computation by up to 40% at run-time without degrading performance (Recall@1). The proposed two-stage distillation pipeline preserves descriptor quality, letting it run on micro-UAV and embedded SLAM stacks while matching state-of-the-art localization accuracy.