π€ AI Summary
In visual object tracking, co-scaling model size, training data volume, and input resolution often leads to optimization difficulties, while existing methods suffer from insufficient iterative refinement and suboptimal multi-scale joint optimization. To address these challenges, we propose DT-Trainingβa novel training framework featuring a progressive scaling strategy, knowledge distillation from a compact teacher model, and a dual-branch feature alignment mechanism. This enables cross-scale parameter sharing and iterative performance improvement. DT-Training supports efficient multi-resolution modeling without architectural modifications. Evaluated on mainstream benchmarks including LaSOT and TrackingNet, it achieves significant improvements over state-of-the-art methods, with tracking accuracy gains of 2.3%β4.1%. Moreover, it enhances generalization capability and cross-task transferability, demonstrating robustness across diverse tracking scenarios.
π Abstract
In this work, we propose a progressive scaling training strategy for visual object tracking, systematically analyzing the influence of training data volume, model size, and input resolution on tracking performance. Our empirical study reveals that while scaling each factor leads to significant improvements in tracking accuracy, naive training suffers from suboptimal optimization and limited iterative refinement. To address this issue, we introduce DT-Training, a progressive scaling framework that integrates small teacher transfer and dual-branch alignment to maximize model potential. The resulting scaled tracker consistently outperforms state-of-the-art methods across multiple benchmarks, demonstrating strong generalization and transferability of the proposed method. Furthermore, we validate the broader applicability of our approach to additional tasks, underscoring its versatility beyond tracking.