🤖 AI Summary
Long-term tracking of arbitrary points in videos remains challenging due to occlusion, drift, and failure in re-localization. To address this, we propose a probabilistic multi-source fusion tracking framework. Methodologically, we introduce the first joint modeling of optical flow motion priors and semantic feature matching, dynamically weighting short-term flow predictions and long-term feature re-identification via a probabilistic integration mechanism; additionally, we incorporate self-supervised trajectory optimization for end-to-end unsupervised training. Our key contributions are: (1) the first probabilistic multi-source prediction architecture tailored for arbitrary-point tracking; (2) explicit decoupling of short-term motion smoothness from long-term robustness, significantly mitigating point loss under occlusion; and (3) state-of-the-art performance among unsupervised/self-supervised methods across multiple benchmarks, with several metrics surpassing current supervised approaches.
📝 Abstract
In this paper, we propose ProTracker, a novel framework for robust and accurate long-term dense tracking of arbitrary points in videos. The key idea of our method is incorporating probabilistic integration to refine multiple predictions from both optical flow and semantic features for robust short-term and long-term tracking. Specifically, we integrate optical flow estimations in a probabilistic manner, producing smooth and accurate trajectories by maximizing the likelihood of each prediction. To effectively re-localize challenging points that disappear and reappear due to occlusion, we further incorporate long-term feature correspondence into our flow predictions for continuous trajectory generation. Extensive experiments show that ProTracker achieves the state-of-the-art performance among unsupervised and self-supervised approaches, and even outperforms supervised methods on several benchmarks. Our code and model will be publicly available upon publication.