🤖 AI Summary
To address the dual challenges of insufficient real-time performance and poor robustness against motion blur induced by high-speed UAV motion, this paper proposes BDTrack: a dynamic early-exit single-stream Vision Transformer (ViT) architecture tailored for UAV tracking. Methodologically, BDTrack builds upon fine-tuning a pre-trained ViT within an end-to-end single-stream framework. Its key contributions are: (1) a novel difficulty-aware dynamic early-exit mechanism that adaptively skips redundant transformer blocks during inference, balancing efficiency and accuracy; and (2) the first integration of motion-blur feature invariance into a ViT-based tracker, achieved via synthetic blur augmentation and dedicated regularization. Evaluated on five mainstream benchmarks, BDTrack achieves state-of-the-art (SOTA) performance, operates at over 30 FPS, and improves mean Average Precision (mAP) by 12.6% under motion blur conditions.
📝 Abstract
Recently, the surge in the adoption of single-stream architectures utilizing pre-trained ViT backbones represents a promising advancement in the field of generic visual tracking. By integrating feature extraction and fusion into a cohesive framework, these architectures offer improved performance, efficiency, and robustness. However, there has been limited exploration into optimizing these frameworks for UAV tracking. In this paper, we boost the efficiency of this framework by tailoring it into an adaptive computation framework that dynamically exits Transformer blocks for real-time UAV tracking. The motivation behind this is that tracking tasks with fewer challenges can be adequately addressed using low-level feature representations. Simpler tasks can often be handled with less demanding, lower-level features. This approach allows the model use computational resources more efficiently by focusing on complex tasks and conserving resources for easier ones. Another significant enhancement introduced in this paper is the improved effectiveness of ViTs in handling motion blur, a common issue in UAV tracking caused by the fast movements of either the UAV, the tracked objects, or both. This is achieved by acquiring motion blur robust representations through enforcing invariance in the feature representation of the target with respect to simulated motion blur. The proposed approach is dubbed BDTrack. Extensive experiments conducted on five tracking benchmarks validate the effectiveness and versatility of our approach, establishing it as a cutting-edge solution in real-time UAV tracking. Code is released at: https://github.com/wuyou3474/BDTrack.