🤖 AI Summary
Traditional football analysis relies predominantly on single-modal data, limiting holistic semantic modeling of matches. To address this, we propose the first multimodal conversational AI framework for football understanding, unifying video frames (with jersey-color perception), ASR-derived speech transcripts, and structured instructions into a vision–language–speech aligned instruction-tuning paradigm. Our method integrates object detection, Vision Transformers (ViT), large language models (LLMs), and ASR within a SoccerNet-based architecture, enabling interpretable and interactive real-time parsing. Evaluated on action classification and refereeing decision tasks, it achieves state-of-the-art performance, with significantly improved generalization in event understanding and decision accuracy matching professional referee benchmarks. Key innovations include a football-specific multimodal alignment mechanism and a novel structured video-instruction fine-tuning technique.
📝 Abstract
The integration of artificial intelligence in sports analytics has transformed soccer video understanding, enabling real-time, automated insights into complex game dynamics. Traditional approaches rely on isolated data streams, limiting their effectiveness in capturing the full context of a match. To address this, we introduce SoccerChat, a multimodal conversational AI framework that integrates visual and textual data for enhanced soccer video comprehension. Leveraging the extensive SoccerNet dataset, enriched with jersey color annotations and automatic speech recognition (ASR) transcripts, SoccerChat is fine-tuned on a structured video instruction dataset to facilitate accurate game understanding, event classification, and referee decision making. We benchmark SoccerChat on action classification and referee decision-making tasks, demonstrating its performance in general soccer event comprehension while maintaining competitive accuracy in referee decision making. Our findings highlight the importance of multimodal integration in advancing soccer analytics, paving the way for more interactive and explainable AI-driven sports analysis. https://github.com/simula/SoccerChat