FARTrack: Fast Autoregressive Visual Tracking with High Performance

๐Ÿ“… 2026-02-03
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work proposes a fast autoregressive visual tracking framework to address the challenge of deploying high-performance trackers on resource-constrained devices, where existing methods suffer from slow inference speeds. The approach innovatively integrates task-specific self-distillation with inter-frame autoregressive sparsification, enabling efficient model compression and globally optimal token selection without relying on manually designed distillation pairs or incurring additional computational overhead. Evaluated on the GOT-10k benchmark, the method achieves an accuracy of 70.6% in average overlap (AO), while delivering remarkable inference speeds of 343 FPS on GPU and 121 FPS on CPU. These results demonstrate a significant balance between accuracy and efficiency, enabling real-time, high-performance visual tracking.

Technology Category

Application Category

๐Ÿ“ Abstract
Inference speed and tracking performance are two critical evaluation metrics in the field of visual tracking. However, high-performance trackers often suffer from slow processing speeds, making them impractical for deployment on resource-constrained devices. To alleviate this issue, we propose FARTrack, a Fast Auto-Regressive Tracking framework. Since autoregression emphasizes the temporal nature of the trajectory sequence, it can maintain high performance while achieving efficient execution across various devices. FARTrack introduces Task-Specific Self-Distillation and Inter-frame Autoregressive Sparsification, designed from the perspectives of shallow-yet-accurate distillation and redundant-to-essential token optimization, respectively. Task-Specific Self-Distillation achieves model compression by distilling task-specific tokens layer by layer, enhancing the model's inference speed while avoiding suboptimal manual teacher-student layer pairs assignments. Meanwhile, Inter-frame Autoregressive Sparsification sequentially condenses multiple templates, avoiding additional runtime overhead while learning a temporally-global optimal sparsification strategy. FARTrack demonstrates outstanding speed and competitive performance. It delivers an AO of 70.6% on GOT-10k in real-time. Beyond, our fastest model achieves a speed of 343 FPS on the GPU and 121 FPS on the CPU.
Problem

Research questions and friction points this paper is trying to address.

visual tracking
inference speed
tracking performance
resource-constrained devices
real-time
Innovation

Methods, ideas, or system contributions that make the work stand out.

Autoregressive Tracking
Self-Distillation
Token Sparsification
Real-time Visual Tracking
Model Compression
๐Ÿ”Ž Similar Papers
No similar papers found.
G
Guijie Wang
Department of Software Engineering, Xiโ€™an Jiaotong University
T
Tong Lin
Department of Software Engineering, Xiโ€™an Jiaotong University
Yifan Bai
Yifan Bai
Alibaba DAMO Academy
Embodied IntelligenceAutonomous DrivingVisual GenerationAI for Medicine
Anjia Cao
Anjia Cao
Xi'an Jiaotong University
Data-Efficient LearningMultimodal LearningMLLMs
S
Shiyi Liang
Department of Software Engineering, Xiโ€™an Jiaotong University
Wangbo Zhao
Wangbo Zhao
National University of Singapore
Efficient Deep LearningDynamic Neural NetworkMultimodal Model
X
Xing Wei
Department of Software Engineering, Xiโ€™an Jiaotong University