🤖 AI Summary
This study addresses the critical sports decision-making problem of penalty kick direction prediction, tackling the challenge of scarce high-quality annotated data by constructing the first manually annotated penalty kick action video dataset. Methodologically, we propose a multimodal deep learning framework that fuses spatiotemporal action features extracted from 22 backbone models—spanning seven architecture families including MViTv2, SlowFast, and I3D—and jointly models them with contextual metadata. Experimental results demonstrate a prediction accuracy of 63.9%, significantly surpassing the average human performance of professional goalkeepers (~58%), thereby validating the efficacy and generalizability of action recognition techniques for real-time sports anticipation. Our core contributions are threefold: (1) the first dedicated, expert-annotated penalty kick video dataset; (2) a novel multimodal modeling paradigm tailored specifically for penalty direction prediction; and (3) empirical evidence demonstrating machine superiority over human experts in this domain.
📝 Abstract
Action anticipation has become a prominent topic in Human Action Recognition (HAR). However, its application to real-world sports scenarios remains limited by the availability of suitable annotated datasets. This work presents a novel dataset of manually annotated soccer penalty kicks to predict shot direction based on pre-kick player movements. We propose a deep learning classifier to benchmark this dataset that integrates HAR-based feature embeddings with contextual metadata. We evaluate twenty-two backbone models across seven architecture families (MViTv2, MViTv1, SlowFast, Slow, X3D, I3D, C2D), achieving up to 63.9% accuracy in predicting shot direction (left or right), outperforming the real goalkeepers' decisions. These results demonstrate the dataset's value for anticipatory action recognition and validate our model's potential as a generalizable approach for sports-based predictive tasks.