UASTrack: A Unified Adaptive Selection Framework with Modality-Customization in Single Object Tracking

📅 2025-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing RGB-X single-object trackers face two key bottlenecks in cross-modal (thermal/event/depth) tracking: non-adaptive modality awareness and difficulty in unifying model architecture and parameters. This paper proposes a unified multi-modal tracking framework addressing these challenges. First, we introduce a Discriminative Automatic Selector (DAS), the first label-free module for modality identification and distribution discrimination. Second, we design a Task-Customized Optimization Adapter (TCOA) that dynamically tailors optimization trajectories in latent space per modality. Third, we integrate cross-modal feature alignment with noise suppression. Evaluated on five benchmarks—including LasHeR and GTOT—our method achieves state-of-the-art or competitive performance while introducing only 1.87M parameters and 1.95G FLOPs. The framework significantly enhances both modality adaptability and architectural unification across heterogeneous sensing modalities.

Technology Category

Application Category

📝 Abstract
Multi-modal tracking is essential in single-object tracking (SOT), as different sensor types contribute unique capabilities to overcome challenges caused by variations in object appearance. However, existing unified RGB-X trackers (X represents depth, event, or thermal modality) either rely on the task-specific training strategy for individual RGB-X image pairs or fail to address the critical importance of modality-adaptive perception in real-world applications. In this work, we propose UASTrack, a unified adaptive selection framework that facilitates both model and parameter unification, as well as adaptive modality discrimination across various multi-modal tracking tasks. To achieve modality-adaptive perception in joint RGB-X pairs, we design a Discriminative Auto-Selector (DAS) capable of identifying modality labels, thereby distinguishing the data distributions of auxiliary modalities. Furthermore, we propose a Task-Customized Optimization Adapter (TCOA) tailored to various modalities in the latent space. This strategy effectively filters noise redundancy and mitigates background interference based on the specific characteristics of each modality. Extensive comparisons conducted on five benchmarks including LasHeR, GTOT, RGBT234, VisEvent, and DepthTrack, covering RGB-T, RGB-E, and RGB-D tracking scenarios, demonstrate our innovative approach achieves comparative performance by introducing only additional training parameters of 1.87M and flops of 1.95G. The code will be available at https://github.com/wanghe/UASTrack.
Problem

Research questions and friction points this paper is trying to address.

Unified adaptive selection framework
Modality-adaptive perception
Multi-modal tracking challenges
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified adaptive selection framework
Discriminative Auto-Selector modality identification
Task-Customized Optimization Adapter
🔎 Similar Papers
No similar papers found.
H
He Wang
School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China
T
Tianyang Xu
School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China
Zhangyong Tang
Zhangyong Tang
Jiangnan University
Xiao-Jun Wu
Xiao-Jun Wu
School of Artificial Intelligence and Computer Science, Jiangnan University
artificial intelligencepattern recognitionmachine learning
Josef Kittler
Josef Kittler
University of Surrey
engineering