🤖 AI Summary
High-resolution tracking data for continuous adversarial sports (e.g., soccer, ice hockey) is scarce and expensive to acquire, severely limiting AI-driven analytics. Method: This paper introduces a simulation-based data generation framework built upon the Google Research Football environment. It constructs the first large-scale, open-source soccer tracking dataset featuring schema compatibility with real-world data and spatiotemporal consistency, accompanied by standardized kinematic feature extraction and event detection pipelines. Contribution/Results: Empirical evaluation on ball possession identification and pass prediction demonstrates that models trained on this synthetic data achieve performance within 5% error margin of those trained on real data. The framework significantly enhances reproducibility and scalability in sports AI research. Its core innovation lies in establishing a publicly available, structurally aligned, and task-ready paradigm for simulated tracking data.
📝 Abstract
Advanced analytics have transformed how sports teams operate, particularly in episodic sports like baseball. Their impact on continuous invasion sports, such as soccer and ice hockey, has been limited due to increased game complexity and restricted access to high-resolution game tracking data. In this demo, we present a method to collect and utilize simulated soccer tracking data from the Google Research Football environment to support the development of models designed for continuous tracking data. The data is stored in a schema that is representative of real tracking data and we provide processes that extract high-level features and events. We include examples of established tracking data models to showcase the efficacy of the simulated data. We address the scarcity of publicly available tracking data, providing support for research at the intersection of artificial intelligence and sports analytics.