🤖 AI Summary
General object tracking remains challenging due to occlusion, appearance variation, and similar distractors. This paper systematically surveys three mainstream paradigms—Siamese, discriminative, and Transformer-based approaches—with particular emphasis on the emerging Transformer-based paradigm. We propose a novel taxonomy tailored for general-purpose trackers and establish a unified, visualization-enhanced, structured comparison framework, analyzing methods across model architecture, spatiotemporal modeling mechanisms, and training strategies. Quantitative evaluation is conducted on major benchmarks including LaSOT and TrackingNet. Results demonstrate that Transformer-based trackers achieve superior robustness and generalization, primarily owing to their global contextual modeling and long-range dependency capture capabilities. The survey clarifies the technical evolution trajectory of general tracking, identifies current bottlenecks—including computational overhead, data hunger, and domain adaptability—and highlights promising future directions such as lightweight design, few-shot adaptation, and cross-domain transfer learning.
📝 Abstract
Generic object tracking remains an important yet challenging task in computer vision due to complex spatio-temporal dynamics, especially in the presence of occlusions, similar distractors, and appearance variations. Over the past two decades, a wide range of tracking paradigms, including Siamese-based trackers, discriminative trackers, and, more recently, prominent transformer-based approaches, have been introduced to address these challenges. While a few existing survey papers in this field have either concentrated on a single category or widely covered multiple ones to capture progress, our paper presents a comprehensive review of all three categories, with particular emphasis on the rapidly evolving transformer-based methods. We analyze the core design principles, innovations, and limitations of each approach through both qualitative and quantitative comparisons. Our study introduces a novel categorization and offers a unified visual and tabular comparison of representative methods. Additionally, we organize existing trackers from multiple perspectives and summarize the major evaluation benchmarks, highlighting the fast-paced advancements in transformer-based tracking driven by their robust spatio-temporal modeling capabilities.