BFMD: A Full-Match Badminton Dense Dataset for Dense Shot Captioning

📅 2026-03-26

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Existing badminton datasets are often limited to short video clips or single tasks, lacking dense, multimodal annotations spanning entire matches, which hinders accurate stroke description generation and comprehensive tactical analysis. To address this gap, this work introduces BFMD, the first full-match multimodal badminton dataset, encompassing 19 matches and 16,751 annotated stroke events. We propose a VideoMAE-based multimodal framework for stroke description generation that integrates heterogeneous cues—including stroke type, shuttlecock trajectory, and player pose—and incorporates a semantic feedback mechanism to enhance semantic consistency and descriptive accuracy. Experimental results demonstrate that our approach significantly outperforms RGB-only baselines and successfully uncovers temporal patterns in tactical evolution throughout matches.

Technology Category

Application Category

📝 Abstract

Understanding tactical dynamics in badminton requires analyzing entire matches rather than isolated clips. However, existing badminton datasets mainly focus on short clips or task-specific annotations and rarely provide full-match data with dense multimodal annotations. This limitation makes it difficult to generate accurate shot captions and perform match-level analysis. To address this limitation, we introduce the first Badminton Full Match Dense (BFMD) dataset, with 19 broadcast matches (including both singles and doubles) covering over 20 hours of play, comprising 1,687 rallies and 16,751 hit events, each annotated with a shot caption. The dataset provides hierarchical annotations including match segments, rally events, and dense rally-level multimodal annotations such as shot types, shuttle trajectories, player pose keypoints, and shot captions. We develop a VideoMAE-based multimodal captioning framework with a Semantic Feedback mechanism that leverages shot semantics to guide caption generation and improve semantic consistency. Experimental results demonstrate that multimodal modeling and semantic feedback improve shot caption quality over RGB-only baselines. We further showcase the potential of BFMD by analyzing the temporal evolution of tactical patterns across full matches.

Problem

Research questions and friction points this paper is trying to address.

badminton

dense captioning

full-match dataset

multimodal annotation

shot captioning

Innovation

Methods, ideas, or system contributions that make the work stand out.

dense shot captioning

multimodal annotation

VideoMAE