🤖 AI Summary
Current football vision understanding research relies on isolated single-task models, struggling to jointly achieve fine-grained perception (e.g., player detection) and high-level semantic reasoning (e.g., event classification). To address this, we propose SoccerFoundation—the first vision foundation model tailored for football understanding—introducing a novel football-specific multi-task supervised pretraining paradigm that unifies modeling across multi-granularity vision tasks. Methodologically, we design an automated spatial annotation pipeline and construct SoccerFactory, a large-scale pretraining dataset integrating proprietary data cleaning, spatial label generation, and cross-dataset fusion strategies. Extensive experiments demonstrate that SoccerFoundation systematically outperforms dedicated single-task expert models across diverse downstream tasks, achieving significant gains in both generalization and performance. These results empirically validate the effectiveness and practicality of a dedicated vision foundation model for football understanding.
📝 Abstract
Soccer understanding has recently garnered growing research interest due to its domain-specific complexity and unique challenges. Unlike prior works that typically rely on isolated, task-specific expert models, this work aims to propose a unified model to handle diverse soccer visual understanding tasks, ranging from fine-grained perception (e.g., athlete detection) to semantic reasoning (e.g., event classification). Specifically, our contributions are threefold: (i) we present SoccerMaster, the first soccer-specific vision foundation model that unifies diverse understanding tasks within a single framework via supervised multi-task pretraining; (ii) we develop an automated data curation pipeline to generate scalable spatial annotations, and integrate them with various existing soccer video datasets to construct SoccerFactory, a comprehensive pretraining data resource; and (iii) we conduct extensive evaluations demonstrating that SoccerMaster consistently outperforms task-specific expert models across diverse downstream tasks, highlighting its breadth and superiority. The data, code, and model will be publicly available.