COACH: Collaborative Agents for Contextual Highlighting - A Multi-Agent Framework for Sports Video Analysis

📅 2025-12-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current end-to-end models for sports video analysis suffer from weak temporal hierarchical modeling, poor generalization, high task-specific adaptation costs, and limited interpretability. To address these limitations, we propose a reconfigurable multi-agent system grounded in a cognitive toolification framework, which orchestrates specialized agents—temporal reasoning agent, event detection module, and generative summarization model—in a collaborative, role-based manner. The system enables dynamic workflow orchestration across temporal scales (from micro-level actions to macro-level strategies) and semantic levels. Its modular architecture supports iterative invocation and flexible reconfiguration, substantially enhancing generalization, interpretability, and cross-task extensibility. Evaluated on a badminton video dataset, our framework achieves unified and robust performance on both fine-grained shot-level question answering and holistic match summarization tasks.

Technology Category

Application Category

📝 Abstract
Intelligent sports video analysis demands a comprehensive understanding of temporal context, from micro-level actions to macro-level game strategies. Existing end-to-end models often struggle with this temporal hierarchy, offering solutions that lack generalization, incur high development costs for new tasks, and suffer from poor interpretability. To overcome these limitations, we propose a reconfigurable Multi-Agent System (MAS) as a foundational framework for sports video understanding. In our system, each agent functions as a distinct "cognitive tool" specializing in a specific aspect of analysis. The system's architecture is not confined to a single temporal dimension or task. By leveraging iterative invocation and flexible composition of these agents, our framework can construct adaptive pipelines for both short-term analytic reasoning (e.g., Rally QA) and long-term generative summarization (e.g., match summaries). We demonstrate the adaptability of this framework using two representative tasks in badminton analysis, showcasing its ability to bridge fine-grained event detection and global semantic organization. This work presents a paradigm shift towards a flexible, scalable, and interpretable system for robust, cross-task sports video intelligence.The project homepage is available at https://aiden1020.github.io/COACH-project-page
Problem

Research questions and friction points this paper is trying to address.

Addresses temporal hierarchy challenges in sports video analysis
Overcomes generalization, cost, and interpretability issues in existing models
Proposes a flexible multi-agent system for cross-task video intelligence
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Agent System for flexible sports video analysis
Agents act as specialized cognitive tools for distinct tasks
Iterative agent composition enables adaptive reasoning and summarization
🔎 Similar Papers
No similar papers found.