🤖 AI Summary
Existing automatically generated football commentaries often fail to meet live-broadcast requirements due to ambiguous entity references, contextual errors, and a lack of match statistics. This work proposes GameSight, the first framework to integrate knowledge enhancement with visual reasoning through a two-stage model: it first aligns anonymous players via fine-grained visual and contextual analysis, then fuses external historical statistics with dynamic game states to generate commentary. This approach substantially improves entity alignment accuracy, contextual relevance, and structural coherence of the generated narratives. Evaluated on the SN-Caption-test-align dataset, GameSight achieves an 18.5% higher player alignment accuracy compared to Gemini 2.5 Pro and demonstrates superior performance in paragraph-level correctness, commentary quality, and global coherence.
📝 Abstract
Soccer commentary plays a crucial role in enhancing the soccer game viewing experience for audiences. Previous studies in automatic soccer commentary generation typically adopt an end-to-end method to generate anonymous live text commentary. Such generated commentary is insufficient in the context of real-world live televised commentary, as it contains anonymous entities, context-dependent errors and lacks statistical insights of the game events. To bridge the gap, we propose GameSight, a two-stage model to address soccer commentary generation as a knowledge-enhanced visual reasoning task, enabling live-televised-like knowledgeable commentary with accurate reference to entities (players and teams). GameSight starts by performing visual reasoning to align anonymous entities with fine-grained visual and contextual analysis. Subsequently, the entity-aligned commentary is refined with knowledge by incorporating external historical statistics and iteratively updated internal game state information. Consequently, GameSight improves the player alignment accuracy by 18.5% on SN-Caption-test-align dataset compared to Gemini 2.5-pro. Combined with further knowledge enhancement, GameSight outperforms in segment-level accuracy and commentary quality, as well as game-level contextual relevance and structural composition. We believe that our work paves the way for a more informative and engaging human-centric experience with the AI sports application. Demo Page: https://gamesight2025.github.io/gamesight2025