UncTrack: Reliable Visual Object Tracking with Uncertainty-Aware Prototype Memory Network

📅 2025-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing Transformer-based trackers formulate object localization as deterministic regression, neglecting uncertainty modeling—leading to unreliable state estimation in complex scenarios. To address this, we propose the first Transformer tracking framework that explicitly models and leverages localization uncertainty. Our approach introduces an Uncertainty-Aware Localization Decoder (ULD) that jointly predicts both bounding-box coordinates and their associated epistemic uncertainty, tightly coupled with a Prototype Memory Network (PMN). Furthermore, we design a confidence-driven memory bank update mechanism that enables uncertainty-guided historical feature retrieval and dynamic template refinement. Extensive experiments demonstrate state-of-the-art performance on major benchmarks including LaSOT and TrackingNet. Notably, our method significantly improves tracking stability and accuracy under challenging conditions such as severe occlusion, large deformations, and heavy background clutter.

Technology Category

Application Category

📝 Abstract
Transformer-based trackers have achieved promising success and become the dominant tracking paradigm due to their accuracy and efficiency. Despite the substantial progress, most of the existing approaches tackle object tracking as a deterministic coordinate regression problem, while the target localization uncertainty has been greatly overlooked, which hampers trackers' ability to maintain reliable target state prediction in challenging scenarios. To address this issue, we propose UncTrack, a novel uncertainty-aware transformer tracker that predicts the target localization uncertainty and incorporates this uncertainty information for accurate target state inference. Specifically, UncTrack utilizes a transformer encoder to perform feature interaction between template and search images. The output features are passed into an uncertainty-aware localization decoder (ULD) to coarsely predict the corner-based localization and the corresponding localization uncertainty. Then the localization uncertainty is sent into a prototype memory network (PMN) to excavate valuable historical information to identify whether the target state prediction is reliable or not. To enhance the template representation, the samples with high confidence are fed back into the prototype memory bank for memory updating, making the tracker more robust to challenging appearance variations. Extensive experiments demonstrate that our method outperforms other state-of-the-art methods. Our code is available at https://github.com/ManOfStory/UncTrack.
Problem

Research questions and friction points this paper is trying to address.

Addresses object tracking uncertainty in challenging scenarios.
Proposes UncTrack for accurate target state inference.
Enhances tracker robustness with prototype memory network.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer encoder for feature interaction
Uncertainty-aware localization decoder (ULD)
Prototype memory network (PMN) for reliability
🔎 Similar Papers
No similar papers found.
Siyuan Yao
Siyuan Yao
University of Notre Dame
VisualizationComputer GraphicsComputer Vision
Y
Yang Guo
School of Computer Science (National Pilot Software Engineering School), Beijing University of Posts and Telecommunications, Beijing 100876, China
Y
Yanyang Yan
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
W
Wenqi Ren
School of Cyber Science and Technology, Shenzhen Campus, Sun Yat-sen University, Shenzhen 518107, China
Xiaochun Cao
Xiaochun Cao
Sun Yat-sen University
Computer VisionArtificial IntelligenceMultimediaMachine Learning