π€ AI Summary
Insufficient robustness in ball tracking under occlusion in sports videos severely hampers event detection and referee assistance. To address this, we propose TOMNetβa ball detection and tracking framework specifically designed for highly occluded scenarios. TOMNet innovatively integrates 3D convolutional networks to model spatiotemporal dynamics, a visibility-weighted loss function that explicitly encodes occlusion states, and a targeted occlusion-aware data augmentation strategy. Furthermore, we introduce TTA, the first publicly available table-tennis benchmark dataset featuring high occlusion rates. Extensive experiments across four public datasets demonstrate that TOMNet significantly improves tracking accuracy: RMSE decreases from 37.30 to 7.19 pixels, and localization accuracy on fully occluded frames rises from 0.63 to 0.80. TOMNet consistently outperforms state-of-the-art methods across all evaluated metrics.
π Abstract
Robust ball tracking under occlusion remains a key challenge in sports video analysis, affecting tasks like event detection and officiating. We present TOTNet, a Temporal Occlusion Tracking Network that leverages 3D convolutions, visibility-weighted loss, and occlusion augmentation to improve performance under partial and full occlusions. Developed in collaboration with Paralympics Australia, TOTNet is designed for real-world sports analytics. We introduce TTA, a new occlusion-rich table tennis dataset collected from professional-level Paralympic matches, comprising 9,159 samples with 1,996 occlusion cases. Evaluated on four datasets across tennis, badminton, and table tennis, TOTNet significantly outperforms prior state-of-the-art methods, reducing RMSE from 37.30 to 7.19 and improving accuracy on fully occluded frames from 0.63 to 0.80. These results demonstrate TOTNets effectiveness for offline sports analytics in fast-paced scenarios. Code and data access:href{https://github.com/AugustRushG/TOTNet}{AugustRushG/TOTNet}.