Learning Motion Blur Robust Vision Transformers with Dynamic Early Exit for Real-Time UAV Tracking

📅 2024-07-07
🏛️ arXiv.org
📈 Citations: 5
Influential: 0
📄 PDF
🤖 AI Summary
To address the dual challenges of insufficient real-time performance and poor robustness against motion blur induced by high-speed UAV motion, this paper proposes BDTrack: a dynamic early-exit single-stream Vision Transformer (ViT) architecture tailored for UAV tracking. Methodologically, BDTrack builds upon fine-tuning a pre-trained ViT within an end-to-end single-stream framework. Its key contributions are: (1) a novel difficulty-aware dynamic early-exit mechanism that adaptively skips redundant transformer blocks during inference, balancing efficiency and accuracy; and (2) the first integration of motion-blur feature invariance into a ViT-based tracker, achieved via synthetic blur augmentation and dedicated regularization. Evaluated on five mainstream benchmarks, BDTrack achieves state-of-the-art (SOTA) performance, operates at over 30 FPS, and improves mean Average Precision (mAP) by 12.6% under motion blur conditions.

Technology Category

Application Category

📝 Abstract
Recently, the surge in the adoption of single-stream architectures utilizing pre-trained ViT backbones represents a promising advancement in the field of generic visual tracking. By integrating feature extraction and fusion into a cohesive framework, these architectures offer improved performance, efficiency, and robustness. However, there has been limited exploration into optimizing these frameworks for UAV tracking. In this paper, we boost the efficiency of this framework by tailoring it into an adaptive computation framework that dynamically exits Transformer blocks for real-time UAV tracking. The motivation behind this is that tracking tasks with fewer challenges can be adequately addressed using low-level feature representations. Simpler tasks can often be handled with less demanding, lower-level features. This approach allows the model use computational resources more efficiently by focusing on complex tasks and conserving resources for easier ones. Another significant enhancement introduced in this paper is the improved effectiveness of ViTs in handling motion blur, a common issue in UAV tracking caused by the fast movements of either the UAV, the tracked objects, or both. This is achieved by acquiring motion blur robust representations through enforcing invariance in the feature representation of the target with respect to simulated motion blur. The proposed approach is dubbed BDTrack. Extensive experiments conducted on five tracking benchmarks validate the effectiveness and versatility of our approach, establishing it as a cutting-edge solution in real-time UAV tracking. Code is released at: https://github.com/wuyou3474/BDTrack.
Problem

Research questions and friction points this paper is trying to address.

Addressing motion blur in UAV tracking via robust vision transformers
Enabling real-time processing with adaptive computation framework
Improving efficiency for high-speed UAV and target movements
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic block exiting for real-time efficiency
Motion blur robust feature representation learning
Adaptive computation framework for UAV tracking
🔎 Similar Papers
No similar papers found.
Y
You Wu
Guilin University of Technology, Guilin 541006, China
X
Xucheng Wang
Guilin University of Technology, Guilin 541006, China
Dan Zeng
Dan Zeng
Sun Yat-sen University
Biometricscomputer visiondeep learning
H
Hengzhou Ye
Guilin University of Technology, Guilin 541006, China; Guangxi Key Laboratory of Embedded Technology and Intelligent Information Processing, Guilin 541006, China
Xiaolan Xie
Xiaolan Xie
Guilin University of Technology, Guilin 541006, China; Guangxi Key Laboratory of Embedded Technology and Intelligent Information Processing, Guilin 541006, China
Qijun Zhao
Qijun Zhao
Professor of Computer Science, Sichuan University
Biometrics3D VisionObject Detection and RecognitionFace RecognitionFingerprint Recognition
Shuiwang Li
Shuiwang Li
Guilin University of Technology