🤖 AI Summary
Existing Transformer-based trackers formulate object localization as deterministic regression, neglecting uncertainty modeling—leading to unreliable state estimation in complex scenarios. To address this, we propose the first Transformer tracking framework that explicitly models and leverages localization uncertainty. Our approach introduces an Uncertainty-Aware Localization Decoder (ULD) that jointly predicts both bounding-box coordinates and their associated epistemic uncertainty, tightly coupled with a Prototype Memory Network (PMN). Furthermore, we design a confidence-driven memory bank update mechanism that enables uncertainty-guided historical feature retrieval and dynamic template refinement. Extensive experiments demonstrate state-of-the-art performance on major benchmarks including LaSOT and TrackingNet. Notably, our method significantly improves tracking stability and accuracy under challenging conditions such as severe occlusion, large deformations, and heavy background clutter.
📝 Abstract
Transformer-based trackers have achieved promising success and become the dominant tracking paradigm due to their accuracy and efficiency. Despite the substantial progress, most of the existing approaches tackle object tracking as a deterministic coordinate regression problem, while the target localization uncertainty has been greatly overlooked, which hampers trackers' ability to maintain reliable target state prediction in challenging scenarios. To address this issue, we propose UncTrack, a novel uncertainty-aware transformer tracker that predicts the target localization uncertainty and incorporates this uncertainty information for accurate target state inference. Specifically, UncTrack utilizes a transformer encoder to perform feature interaction between template and search images. The output features are passed into an uncertainty-aware localization decoder (ULD) to coarsely predict the corner-based localization and the corresponding localization uncertainty. Then the localization uncertainty is sent into a prototype memory network (PMN) to excavate valuable historical information to identify whether the target state prediction is reliable or not. To enhance the template representation, the samples with high confidence are fed back into the prototype memory bank for memory updating, making the tracker more robust to challenging appearance variations. Extensive experiments demonstrate that our method outperforms other state-of-the-art methods. Our code is available at https://github.com/ManOfStory/UncTrack.