🤖 AI Summary
Existing point tracking models suffer from performance degradation in real-world videos due to domain shift and the scarcity of dense ground-truth annotations. To address this, this work proposes a verifier-guided pseudo-label generation framework that aggregates candidate trajectories from multiple pretrained trackers and employs a learnable meta-verifier to dynamically assess the reliability of per-frame predictions. High-quality pseudo-labels selected by the verifier are then used for self-supervised fine-tuning. This approach substantially improves both pseudo-label quality and data utilization efficiency, achieving state-of-the-art performance on four real-world benchmarks while requiring significantly less fine-tuning data compared to existing self-training methods.
📝 Abstract
Models for long-term point tracking are typically trained on large synthetic datasets. The performance of these models degrades in real-world videos due to different characteristics and the absence of dense ground-truth annotations. Self-training on unlabeled videos has been explored as a practical solution, but the quality of pseudo-labels strongly depends on the reliability of teacher models, which vary across frames and scenes. In this paper, we address the problem of real-world fine-tuning and introduce verifier, a meta-model that learns to assess the reliability of tracker predictions and guide pseudo-label generation. Given candidate trajectories from multiple pretrained trackers, the verifier evaluates them per frame and selects the most trustworthy predictions, resulting in high-quality pseudo-label trajectories. When applied for fine-tuning, verifier-guided pseudo-labeling substantially improves the quality of supervision and enables data-efficient adaptation to unlabeled videos. Extensive experiments on four real-world benchmarks demonstrate that our approach achieves state-of-the-art results while requiring less data than prior self-training methods. Project page: https://kuis-ai.github.io/track_on_r