Predicting Penalty Kick Direction Using Multi-Modal Deep Learning with Pose-Guided Attention

📅 2025-09-30
📈 Citations: 0
Influential: 0
📄 PDF

career value

209K/year
🤖 AI Summary
This study addresses the early prediction of soccer penalty kick direction (left/center/right) prior to ball contact, aiming to support goalkeepers’ pre-touch intention inference. We propose a lightweight, interpretable multimodal deep learning framework comprising dual-branch networks for RGB visual and 2D pose inputs. A novel pose-guided attention mechanism dynamically highlights kinematically salient regions. Spatial features are extracted via MobileNetV2, while LSTM models temporal pose evolution; input consistency is ensured through distance-threshold normalization. Evaluated on a custom dataset of 755 penalty kicks, our method achieves 89% accuracy—outperforming unimodal baselines by 14–22%. With an inference latency of only 22 ms, the framework enables real-time deployment and holds promise for tactical analysis applications.

Technology Category

Application Category

📝 Abstract
Penalty kicks often decide championships, yet goalkeepers must anticipate the kicker's intent from subtle biomechanical cues within a very short time window. This study introduces a real-time, multi-modal deep learning framework to predict the direction of a penalty kick (left, middle, or right) before ball contact. The model uses a dual-branch architecture: a MobileNetV2-based CNN extracts spatial features from RGB frames, while 2D keypoints are processed by an LSTM network with attention mechanisms. Pose-derived keypoints further guide visual focus toward task-relevant regions. A distance-based thresholding method segments input sequences immediately before ball contact, ensuring consistent input across diverse footage. A custom dataset of 755 penalty kick events was created from real match videos, with frame-level annotations for object detection, shooter keypoints, and final ball placement. The model achieved 89% accuracy on a held-out test set, outperforming visual-only and pose-only baselines by 14-22%. With an inference time of 22 milliseconds, the lightweight and interpretable design makes it suitable for goalkeeper training, tactical analysis, and real-time game analytics.
Problem

Research questions and friction points this paper is trying to address.

Predicting penalty kick direction using multi-modal deep learning
Analyzing biomechanical cues before ball contact in real-time
Developing pose-guided attention for goalkeeper training applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal deep learning with pose-guided attention
Dual-branch architecture combining CNN and LSTM
Distance-based thresholding for consistent input segmentation
🔎 Similar Papers
No similar papers found.