FineBadminton: A Multi-Level Dataset for Fine-Grained Badminton Video Understanding

📅 2025-08-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Fine-grained understanding of high-speed sports such as badminton remains challenging due to inherent spatiotemporal complexity and a critical scarcity of high-quality, domain-specific annotated data. Method: This work introduces FBBench—the first benchmark for badminton video understanding—built upon a novel multi-level annotation dataset covering fundamental actions, tactical semantics, and decision-level evaluations. We propose a human-in-the-loop annotation pipeline integrating multimodal large language models (MLLMs) for candidate generation and expert refinement, augmented by stroke-centered keyframe selection and coordinate-guided visual compression. Contribution/Results: Comprehensive evaluation on FBBench reveals severe limitations of current MLLMs in spatiotemporal reasoning and tactical analysis. Our approach significantly improves performance across all tiers, establishing FBBench as a reproducible benchmark, delivering high-quality labeled resources, and providing an effective technical baseline for fine-grained sports intelligence.

Technology Category

Application Category

📝 Abstract
Fine-grained analysis of complex and high-speed sports like badminton presents a significant challenge for Multimodal Large Language Models (MLLMs), despite their notable advancements in general video understanding. This difficulty arises primarily from the scarcity of datasets with sufficiently rich and domain-specific annotations. To bridge this gap, we introduce FineBadminton, a novel and large-scale dataset featuring a unique multi-level semantic annotation hierarchy (Foundational Actions, Tactical Semantics, and Decision Evaluation) for comprehensive badminton understanding. The construction of FineBadminton is powered by an innovative annotation pipeline that synergistically combines MLLM-generated proposals with human refinement. We also present FBBench, a challenging benchmark derived from FineBadminton, to rigorously evaluate MLLMs on nuanced spatio-temporal reasoning and tactical comprehension. Together, FineBadminton and FBBench provide a crucial ecosystem to catalyze research in fine-grained video understanding and advance the development of MLLMs in sports intelligence. Furthermore, we propose an optimized baseline approach incorporating Hit-Centric Keyframe Selection to focus on pivotal moments and Coordinate-Guided Condensation to distill salient visual information. The results on FBBench reveal that while current MLLMs still face significant challenges in deep sports video analysis, our proposed strategies nonetheless achieve substantial performance gains. The project homepage is available at https://finebadminton.github.io/FineBadminton/.
Problem

Research questions and friction points this paper is trying to address.

Addresses lack of domain-specific datasets for badminton video analysis
Introduces multi-level semantic annotations for comprehensive sports understanding
Proposes benchmark to evaluate MLLMs on tactical reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-level semantic annotation hierarchy for badminton
MLLM-generated proposals with human refinement pipeline
Hit-Centric Keyframe Selection and Coordinate-Guided Condensation
🔎 Similar Papers
No similar papers found.
X
Xusheng He
Harbin Institute of Technology, Shenzhen
W
Wei Liu
Harbin Institute of Technology, Shenzhen
Shanshan Ma
Shanshan Ma
China Electronics Standardization Institute
Q
Qian Liu
Shandong University
C
Chenghao Ma
China Electronics Standardization Institute
Jianlong Wu
Jianlong Wu
Professor, Harbin Institute of Technology (Shenzhen)
Computer VisionMultimodal Learning