FOOTPASS: A Multi-Modal Multi-Agent Tactical Context Dataset for Play-by-Play Action Spotting in Soccer Broadcast Videos

📅 2025-11-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing football video understanding methods struggle to automatically generate reliable, fine-grained play-by-play annotations due to the absence of tactical semantic grounding in action recognition and the disconnection between multimodal visual outputs (e.g., tracking, identity recognition) and long-term tactical patterns. This paper introduces the first multimodal, multi-agent action recognition framework integrated with tactical context priors—grounded in agent-level state perception and team-level behavioral reasoning—and jointly modeling multi-object tracking, player re-identification, spatiotemporal action detection, and long-horizon tactical modeling for frame-level action localization across full matches. Contributions include: (1) the first full-match tactical-aware benchmark; (2) a scalable paradigm for multi-person, multi-agent action localization; and (3) significantly improved reliability of automated annotation, enabling high-quality, structured play-by-play streams for data-driven football analytics.

Technology Category

Application Category

📝 Abstract
Soccer video understanding has motivated the creation of datasets for tasks such as temporal action localization, spatiotemporal action detection (STAD), or multiobject tracking (MOT). The annotation of structured sequences of events (who does what, when, and where) used for soccer analytics requires a holistic approach that integrates both STAD and MOT. However, current action recognition methods remain insufficient for constructing reliable play-by-play data and are typically used to assist rather than fully automate annotation. Parallel research has advanced tactical modeling, trajectory forecasting, and performance analysis, all grounded in game-state and play-by-play data. This motivates leveraging tactical knowledge as a prior to support computer-vision-based predictions, enabling more automated and reliable extraction of play-by-play data. We introduce Footovision Play-by-Play Action Spotting in Soccer Dataset (FOOTPASS), the first benchmark for play-by-play action spotting over entire soccer matches in a multi-modal, multi-agent tactical context. It enables the development of methods for player-centric action spotting that exploit both outputs from computer-vision tasks (e.g., tracking, identification) and prior knowledge of soccer, including its tactical regularities over long time horizons, to generate reliable play-by-play data streams. These streams form an essential input for data-driven sports analytics.
Problem

Research questions and friction points this paper is trying to address.

Automating play-by-play action spotting in soccer broadcast videos
Integrating computer vision with tactical knowledge for reliable data
Creating multi-modal multi-agent dataset for player-centric action recognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal multi-agent tactical context dataset
Leveraging tactical knowledge for computer-vision predictions
Player-centric action spotting with long-term tactical regularities
🔎 Similar Papers
No similar papers found.