RacketVision: A Multiple Racket Sports Benchmark for Unified Ball and Racket Analysis

📅 2025-11-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses three core challenges in racket-sport analytics—fine-grained ball tracking, joint-level racket pose estimation, and ball trajectory prediction—across badminton, tennis, and table tennis. To this end, we introduce the first large-scale, multi-view video dataset covering all three sports, featuring fine-grained annotations of 3D ball center positions and multi-joint racket poses. We propose a unified multi-task benchmark and identify that naive concatenation of multimodal features degrades performance. Accordingly, we design a Cross-Attention mechanism to enable adaptive fusion of visual and geometric features. Experiments demonstrate that our approach significantly outperforms strong unimodal baselines on trajectory prediction, validating the efficacy of cross-modal collaborative modeling. Our dataset, benchmark, and method establish a new paradigm and reproducible foundation for multimodal learning in sports analytics.

Technology Category

Application Category

📝 Abstract
We introduce RacketVision, a novel dataset and benchmark for advancing computer vision in sports analytics, covering table tennis, tennis, and badminton. The dataset is the first to provide large-scale, fine-grained annotations for racket pose alongside traditional ball positions, enabling research into complex human-object interactions. It is designed to tackle three interconnected tasks: fine-grained ball tracking, articulated racket pose estimation, and predictive ball trajectory forecasting. Our evaluation of established baselines reveals a critical insight for multi-modal fusion: while naively concatenating racket pose features degrades performance, a CrossAttention mechanism is essential to unlock their value, leading to trajectory prediction results that surpass strong unimodal baselines. RacketVision provides a versatile resource and a strong starting point for future research in dynamic object tracking, conditional motion forecasting, and multimodal analysis in sports. Project page at https://github.com/OrcustD/RacketVision
Problem

Research questions and friction points this paper is trying to address.

Unified ball and racket tracking across multiple sports disciplines
Articulated racket pose estimation with fine-grained annotations
Multimodal trajectory prediction through effective feature fusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

CrossAttention mechanism for multimodal fusion
Fine-grained racket pose and ball tracking
Unified dataset for multiple racket sports
🔎 Similar Papers
No similar papers found.
L
Linfeng Dong
Zhejiang University, Shanghai AI Laboratory
Y
Yuchen Yang
Fudan University, Shanghai AI Laboratory
H
Hao Wu
University of Science and Technology of China, Shanghai AI Laboratory
W
Wei Wang
Shanghai AI Laboratory
Yuenan Hou
Yuenan Hou
Shanghai AI Laboratory
Autonomous DrivingEmbodied AIEfficient Learning
Zhihang Zhong
Zhihang Zhong
Researcher, Shanghai AI Laboratory
Computer visionDeep learning
X
Xiao Sun
Shanghai AI Laboratory