🤖 AI Summary
This work addresses the joint inversion of ball spin direction and full 3D trajectory from monocular broadcast videos of table tennis, using only 2D ball-center trajectories—without any real-world annotated data.
Method: We propose a novel physics-guided learning framework driven exclusively by synthetic data: high-fidelity synthetic videos are generated via rigid-body dynamics; a unified architecture integrates 2D trajectory encoding, geometry-aware feature enhancement, and physics-constrained inversion to jointly optimize spin classification and 3D trajectory regression in an end-to-end manner.
Contribution/Results: Crucially, the method eliminates reliance on real-data fine-tuning, achieving cross-domain generalization through principled physical modeling and augmentation. Experiments demonstrate state-of-the-art performance: 92.0% accuracy in spin-direction classification and a 2D reprojection error of only 0.19% of the image diagonal length.
📝 Abstract
Analyzing a player's technique in table tennis requires knowledge of the ball's 3D trajectory and spin. While, the spin is not directly observable in standard broadcasting videos, we show that it can be inferred from the ball's trajectory in the video. We present a novel method to infer the initial spin and 3D trajectory from the corresponding 2D trajectory in a video. Without ground truth labels for broadcast videos, we train a neural network solely on synthetic data. Due to the choice of our input data representation, physically correct synthetic training data, and using targeted augmentations, the network naturally generalizes to real data. Notably, these simple techniques are sufficient to achieve generalization. No real data at all is required for training. To the best of our knowledge, we are the first to present a method for spin and trajectory prediction in simple monocular broadcast videos, achieving an accuracy of 92.0% in spin classification and a 2D reprojection error of 0.19% of the image diagonal.