KASportsFormer: Kinematic Anatomy Enhanced Transformer for 3D Human Pose Estimation on Short Sports Scene Video

📅 2025-07-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In sports scenarios, 3D human pose estimation faces challenges including complex motions, motion blur, occlusions, and difficulty modeling transient critical actions (e.g., kicking). To address these, this paper proposes an anatomy-aware Transformer framework. It innovatively incorporates biomechanical priors from运动 anatomy, introducing two novel modules: BoneExt (bone kinematics enhancement) and LimbFus (limb-level feature fusion), which explicitly encode short-term, high-acceleration motion constraints to improve robustness under occlusion and blur. The framework jointly optimizes feature representation via skeletal extraction, multi-granularity limb fusion, and multimodal encoding. Evaluated on SportsPose and WorldPose benchmarks, it achieves state-of-the-art MPJPE scores of 58.0 mm and 34.3 mm, respectively—outperforming existing methods.

Technology Category

Application Category

📝 Abstract
Recent transformer based approaches have demonstrated impressive performance in solving real-world 3D human pose estimation problems. Albeit these approaches achieve fruitful results on benchmark datasets, they tend to fall short of sports scenarios where human movements are more complicated than daily life actions, as being hindered by motion blur, occlusions, and domain shifts. Moreover, due to the fact that critical motions in a sports game often finish in moments of time (e.g., shooting), the ability to focus on momentary actions is becoming a crucial factor in sports analysis, where current methods appear to struggle with instantaneous scenarios. To overcome these limitations, we introduce KASportsFormer, a novel transformer based 3D pose estimation framework for sports that incorporates a kinematic anatomy-informed feature representation and integration module. In which the inherent kinematic motion information is extracted with the Bone Extractor (BoneExt) and Limb Fuser (LimbFus) modules and encoded in a multimodal manner. This improved the capability of comprehending sports poses in short videos. We evaluate our method through two representative sports scene datasets: SportsPose and WorldPose. Experimental results show that our proposed method achieves state-of-the-art results with MPJPE errors of 58.0mm and 34.3mm, respectively. Our code and models are available at: https://github.com/jw0r1n/KASportsFormer
Problem

Research questions and friction points this paper is trying to address.

Estimating 3D human poses in short sports videos with complex movements
Addressing motion blur, occlusions, and domain shifts in sports scenarios
Focusing on momentary actions critical for sports analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Kinematic anatomy-informed feature representation
Bone Extractor and Limb Fuser modules
Multimodal encoding for sports poses
🔎 Similar Papers
No similar papers found.
Zhuoer Yin
Zhuoer Yin
Nagoya University
Calvin Yeung
Calvin Yeung
Nagoya university
Machine LearningComputer VisionPredictive ModelingSports Analytics
T
Tomohiro Suzuki
Nagoya University, Nagoya, Japan
R
Ryota Tanaka
Nagoya University, Nagoya, Japan
K
Keisuke Fujii
Nagoya University, Nagoya, Japan