Game State and Spatio-temporal Action Detection in Soccer using Graph Neural Networks and 3D Convolutional Networks

📅 2025-02-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing monocular football video analysis methods for detecting ball-related events (e.g., passes, shots) rely heavily on costly manual annotations and lack explicit modeling of match context. To address this, we propose a semantic-aware spatiotemporal action detection framework. Our method explicitly encodes dynamic match state—including player positions, velocities, and team affiliations—as a temporal graph structure, which is jointly optimized end-to-end with visual features extracted via 3D convolutional networks. A graph neural network (GNN) models player interactions to capture tactical semantics. Experiments on real-world match footage demonstrate significant improvements in spatiotemporal action detection mean Average Precision (mAP), confirming that match-state information provides substantial predictive gain over vision-only baselines. This work establishes a new paradigm for low-cost, high-accuracy sports understanding by unifying geometric, kinematic, and strategic cues within a learnable graph-structured representation.

Technology Category

Application Category

📝 Abstract
Soccer analytics rely on two data sources: the player positions on the pitch and the sequences of events they perform. With around 2000 ball events per game, their precise and exhaustive annotation based on a monocular video stream remains a tedious and costly manual task. While state-of-the-art spatio-temporal action detection methods show promise for automating this task, they lack contextual understanding of the game. Assuming professional players' behaviors are interdependent, we hypothesize that incorporating surrounding players' information such as positions, velocity and team membership can enhance purely visual predictions. We propose a spatio-temporal action detection approach that combines visual and game state information via Graph Neural Networks trained end-to-end with state-of-the-art 3D CNNs, demonstrating improved metrics through game state integration.
Problem

Research questions and friction points this paper is trying to address.

Automating soccer action detection using video data.
Enhancing detection with player context and game state.
Integrating Graph Neural Networks with 3D CNNs.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph Neural Networks
3D Convolutional Networks
Spatio-temporal Action Detection
🔎 Similar Papers
No similar papers found.
J
Jeremie Ochin
Centre for Robotics, MINES Paris - PSL, France
G
Guillaume Devineau
Footovision, France
Bogdan Stanciulescu
Bogdan Stanciulescu
Mines ParisTech
computer visionmachine learningrobotics
S
Sotiris Manitsaris
Centre for Robotics, MINES Paris - PSL, France