🤖 AI Summary
This work proposes a Vision Transformer–based method for automatically detecting high-risk tackling actions in American football to enhance player safety during training. To address the scarcity of hazardous tackle samples, the authors introduce an imbalance-aware training strategy and construct a large-scale video dataset comprising 733 annotated tackle clips. This study represents the first application of Vision Transformers to this task, effectively enabling temporal localization and classification of risky tackles. Under cross-validation, the model achieves a recall of 0.67 and an F1 score of 0.59 for high-risk tackles, outperforming existing baselines by over 8 percentage points in recall on a substantially larger dataset.
📝 Abstract
Early identification of hazardous actions in contact sports enables timely intervention and improves player safety. We present a method for detecting risky tackles in American football practice videos and introduce a substantially expanded dataset for this task. Our work contains 733 single-athlete-dummy tackle clips, each temporally localized around first point contact and labeled with a strike zone component of the standardized Assessment for Tackling Technique (SATT-3), extending prior work that reported 178 annotated videos. Using a Vision transformer-based model with imbalance-aware training, we obtain risky recall of 0.67 and Risky F1 of 0.59 under crossvalidation. Relative to the previous baseline in a smaller subset (risky recall of 0.58; Risky F1 0.56 ), our approach improves risky recall by more than 8% points on a much larger dataset. These results indicate that the vision transformer-based video analysis, coupled with careful handling of class imbalance, can reliably detect rare but safety-critical tackling patterns, offering a practical pathway toward coach-centered injury prevention tools.