Target-aware Bidirectional Fusion Transformer for Aerial Object Tracking

📅 2025-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In aerial remote sensing, lightweight trackers suffer from insufficient decoupling between detection and localization capabilities due to single-stage feature fusion, limiting both robustness and accuracy. To address this, we propose a Target-Aware Bidirectional Fusion Transformer (TBF-Transformer). Our approach features: (1) a dual-stream linear self- and cross-attention network enabling forward–backward multi-stage feature decoupling and fusion; (2) target-aware positional encoding that jointly models local details and global semantics; and (3) a lightweight Transformer design optimized for embedded deployment. Evaluated on UAV-123, UAV20L, and UAVTrack112, our method achieves state-of-the-art performance in both accuracy and robustness. On embedded platforms, it operates at 30.5 FPS, demonstrating an effective balance between high precision and real-time inference capability.

Technology Category

Application Category

📝 Abstract
The trackers based on lightweight neural networks have achieved great success in the field of aerial remote sensing, most of which aggregate multi-stage deep features to lift the tracking quality. However, existing algorithms usually only generate single-stage fusion features for state decision, which ignore that diverse kinds of features are required for identifying and locating the object, limiting the robustness and precision of tracking. In this paper, we propose a novel target-aware Bidirectional Fusion transformer (BFTrans) for UAV tracking. Specifically, we first present a two-stream fusion network based on linear self and cross attentions, which can combine the shallow and the deep features from both forward and backward directions, providing the adjusted local details for location and global semantics for recognition. Besides, a target-aware positional encoding strategy is designed for the above fusion model, which is helpful to perceive the object-related attributes during the fusion phase. Finally, the proposed method is evaluated on several popular UAV benchmarks, including UAV-123, UAV20L and UAVTrack112. Massive experimental results demonstrate that our approach can exceed other state-of-the-art trackers and run with an average speed of 30.5 FPS on embedded platform, which is appropriate for practical drone deployments.
Problem

Research questions and friction points this paper is trying to address.

Improves aerial object tracking using bidirectional feature fusion.
Enhances tracking robustness and precision with multi-stage features.
Optimizes UAV tracking for practical drone deployment efficiency.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bidirectional Fusion Transformer for UAV tracking
Two-stream network with linear self and cross attentions
Target-aware positional encoding for object perception
🔎 Similar Papers
No similar papers found.
Xinglong Sun
Xinglong Sun
NVIDIA, Stanford, UIUC
Efficient Deep LearningComputer VisionAutonomous Driving
H
Haijiang Sun
Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Science, Changchun 130033, China
S
Shan Jiang
Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Science, Changchun 130033, China
Jiacheng Wang
Jiacheng Wang
Nanyang Technological University
ISACGenAILow-altitude wireless networkSemantic Communications
J
Jiasong Wang
Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Science, Changchun 130033, China