RELO: Reinforcement Learning to Localize for Visual Object Tracking

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

156K/year

🤖 AI Summary

This work addresses the misalignment between traditional visual object tracking methods— which rely on handcrafted spatial priors such as heatmaps—and standard evaluation metrics like IoU and AUC. To overcome this limitation, the authors formulate object localization as a Markov decision process and employ reinforcement learning to directly optimize a localization policy using a reward function that combines IoU and AUC, thereby eliminating the need for manual priors. Additionally, they introduce a layer-aligned temporal token propagation mechanism that enhances inter-frame semantic consistency with negligible computational overhead. The proposed method achieves state-of-the-art performance across multiple benchmarks, attaining an AUC of 57.5% on LaSOText without template updates.

📝 Abstract

Conventional visual object trackers localize targets using handcrafted spatial priors, often in the form of heatmaps. Such priors provide only surrogate supervision and are poorly aligned with tracking optimization and evaluation metrics, such as intersection over union (IoU) and area under the success curve (AUC). Here, we introduce RELO, a REinforcement-learning-to-LOcalize method for visual object tracking that formulates target localization as a Markov decision process. Specifically, RELO replaces handcrafted spatial priors with a localization policy learned over spatial positions via reinforcement learning, with rewards combining frame-level IoU and sequence-level AUC. We additionally introduce layer-aligned temporal token propagation to improve semantic consistency across frames, with negligible computational overhead. Across multiple benchmarks, RELO achieves superior results, attaining 57.5% AUC on LaSOText without template updates. This confirms that reward-driven localization provides an effective alternative to prior-driven localization for visual object tracking.

Problem

Research questions and friction points this paper is trying to address.

visual object tracking

localization

spatial priors

reinforcement learning

IoU

Innovation

Methods, ideas, or system contributions that make the work stand out.

reinforcement learning

visual object tracking

localization policy