🤖 AI Summary
The absence of a publicly available multi-object tracking (MOT) benchmark for American football hinders progress in sports video analysis. Method: We introduce FootballMOT, the first fine-grained detection and tracking dataset tailored to this sport, capturing high-density scenes, severe occlusions, and frequent physical interactions. Our approach integrates a fine-tuned detector, re-identification feature enhancement, and an end-to-end tracking framework via collaborative optimization. Results: Fine-tuning the detector significantly improves recall for small and occluded targets. Our tracking system achieves 62.3% mOTA on FootballMOT—outperforming state-of-the-art methods by an average of 4.7%—and reduces ID switches by 31.5%. This work establishes a new standard benchmark and provides a reproducible technical pipeline for MOT research in complex, contact-intensive sports scenarios.
📝 Abstract
Multi-Object Tracking (MOT) plays a critical role in analyzing player behavior from videos, enabling performance evaluation. Current MOT methods are often evaluated using publicly available datasets. However, most of these focus on everyday scenarios such as pedestrian tracking or are tailored to specific sports, including soccer and basketball. Despite the inherent challenges of tracking players in American football, such as frequent occlusion and physical contact, no standardized dataset has been publicly available, making fair comparisons between methods difficult. To address this gap, we constructed the first dedicated detection and tracking dataset for the American football players and conducted a comparative evaluation of various detection and tracking methods. Our results demonstrate that accurate detection and tracking can be achieved even in crowded scenarios. Fine-tuning detection models improved performance over pre-trained models. Furthermore, when these fine-tuned detectors and re-identification models were integrated into tracking systems, we observed notable improvements in tracking accuracy compared to existing approaches. This work thus enables robust detection and tracking of American football players in challenging, high-density scenarios previously underserved by conventional methods.