Boosting Self-Supervised Tracking with Contextual Prompts and Noise Learning

📅 2026-05-07
📈 Citations: 0
Influential: 0
📄 PDF

career value

174K/year
🤖 AI Summary
This work addresses the challenge that existing self-supervised tracking methods struggle to effectively model contextual information from unlabeled videos due to the absence of semantic guidance, leading to unreliable contextual cues. To overcome this limitation, the authors propose a dual-modal contextual association mechanism. During early training stages, fine-grained instance-level semantic prompts are introduced to guide the forward and backward tracking branches in acquiring fundamental tracking knowledge. Subsequently, contextual noise-perturbed features are progressively injected following an easy-to-hard learning strategy, thereby enhancing the model’s robust representational capacity in complex feature spaces. Notably, this mechanism is employed solely during training and imposes no overhead on inference efficiency. Extensive experiments demonstrate that the proposed approach significantly improves both performance and robustness of self-supervised tracking across multiple benchmarks.
📝 Abstract
Learning robust contextual knowledge from unlabeled videos is essential for advancing self-supervised tracking. However, conventional self-supervised trackers lack effective context modeling, while existing context association methods based on non-semantic queries struggle to adapt to unlabeled tracking scenarios, making it difficult to learn reliable contextual cues. In this work, we propose a novel self-supervised tracking framework, named \textbf{\tracker}, which introduces a dual-modal context association mechanism that jointly leverages fine-grained semantic prompts and contextual noise to drive the model toward learning robust tracking representations. Adherent to the easy-to-hard learning principle, our contextual association mechanism operates based on two stages. During early training, instance patch tokens (prompts) are assigned to both forward and backward tracking branches to facilitate the acquisition of tracking knowledge. As training progresses, contextual noise is gradually injected into the model to perturb feature, encouraging the tracker to learn robust tracking representations in a more complex feature space. Thus, this novel contextual association mechanism enables our self-supervised model to learn high-quality tracking representations from unlabeled videos, while being applied exclusively during training to preserve efficient inference. Extensive experiments demonstrate the superiority of our method.
Problem

Research questions and friction points this paper is trying to address.

self-supervised tracking
context modeling
unlabeled videos
contextual cues
non-semantic queries
Innovation

Methods, ideas, or system contributions that make the work stand out.

self-supervised tracking
contextual prompts
noise learning
dual-modal context association
robust representation learning
🔎 Similar Papers
No similar papers found.
Yaozong Zheng
Yaozong Zheng
Guangxi Normal University
Visual TrackingMultimodal Tracking
Q
Qihua Liang
Key Laboratory of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China; University Engineering Research Center of Educational Intelligent Technology, Guangxi Normal University, Guilin 541004, China
B
Bineng Zhong
Key Laboratory of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China; University Engineering Research Center of Educational Intelligent Technology, Guangxi Normal University, Guilin 541004, China
S
Shuimu Zeng
University of Southampton, Southampton, SO17 1BJ, United Kingdom
Y
Yuanliang Xue
Xi’an Research Institute of High Technology, Xi’an 710025, China
N
Ning Li
Key Laboratory of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China; University Engineering Research Center of Educational Intelligent Technology, Guangxi Normal University, Guilin 541004, China
S
Shuxiang Song
Key Laboratory of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China; University Engineering Research Center of Educational Intelligent Technology, Guangxi Normal University, Guilin 541004, China