UETrack: A Unified and Efficient Framework for Single Object Tracking

πŸ“… 2026-03-01
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the limitations of existing single-object tracking methods, which are predominantly confined to the RGB modality, struggle to efficiently fuse multimodal information, and often exhibit high model complexity that hinders deployment on resource-constrained devices. To overcome these challenges, we propose UETrackβ€”a unified and efficient multimodal single-object tracking framework capable of integrating diverse inputs including RGB, depth, thermal, event-based data, and language. Key innovations include a Token-Pooling-based mixture-of-experts mechanism to enhance feature representation and a target-aware adaptive distillation strategy to reduce redundant supervision. Evaluated across 12 benchmarks, UETrack-B achieves 69.2% AUC on LaSOT while delivering real-time performance of 163, 56, and 60 FPS on GPU, CPU, and AGX platforms, respectively, significantly advancing the state-of-the-art trade-off between speed and accuracy.

Technology Category

Application Category

πŸ“ Abstract
With growing real-world demands, efficient tracking has received increasing attention. However, most existing methods are limited to RGB inputs and struggle in multi-modal scenarios. Moreover, current multi-modal tracking approaches typically use complex designs, making them too heavy and slow for resource-constrained deployment. To tackle these limitations, we propose UETrack, an efficient framework for single object tracking. UETrack demonstrates high practicality and versatility, efficiently handling multiple modalities including RGB, Depth, Thermal, Event, and Language, and addresses the gap in efficient multi-modal tracking. It introduces two key components: a Token-Pooling-based Mixture-of-Experts mechanism that enhances modeling capacity through feature aggregation and expert specialization, and a Target-aware Adaptive Distillation strategy that selectively performs distillation based on sample characteristics, reducing redundant supervision and improving performance. Extensive experiments on 12 benchmarks across 3 hardware platforms show that UETrack achieves a superior speed-accuracy trade-off compared to previous methods. For instance, UETrack-B achieves 69.2% AUC on LaSOT and runs at 163/56/60 FPS on GPU/CPU/AGX, demonstrating strong practicality and versatility. Code is available at https://github.com/kangben258/UETrack.
Problem

Research questions and friction points this paper is trying to address.

single object tracking
multi-modal tracking
efficient tracking
resource-constrained deployment
RGB-D-Thermal-Event-Language
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal Tracking
Mixture-of-Experts
Adaptive Distillation
Efficient Tracking
Token Pooling
πŸ”Ž Similar Papers
No similar papers found.