🤖 AI Summary
This paper addresses the problem of recovering accurate 3D trajectories of spherical objects from monocular 2D tracking sequences, where depth ambiguity and viewpoint dependency pose fundamental challenges. To this end, we propose a camera-agnostic canonical 3D representation framework that integrates multi-level intermediate representations (2D → canonical 3D → camera-specific 3D) with reprojection consistency constraints and models temporal dynamics via LSTM. Crucially, our method is trained exclusively on synthetic data and achieves zero-shot generalization to real-world scenarios. Evaluated on four synthetic and three real-world datasets, it establishes new state-of-the-art performance, significantly outperforming existing approaches. Key contributions include: (1) the first canonical 3D modeling paradigm specifically designed for spherical motion; (2) an end-to-end trainable architecture requiring no ground-truth 3D annotations; and (3) strong cross-domain generalization, enabling direct deployment in sports analytics and virtual replay applications.
📝 Abstract
We present a method for 3D ball trajectory estimation from a 2D tracking sequence. To overcome the ambiguity in 3D from 2D estimation, we design an LSTM-based pipeline that utilizes a novel canonical 3D representation that is independent of the camera's location to handle arbitrary views and a series of intermediate representations that encourage crucial invariance and reprojection consistency. We evaluated our method on four synthetic and three real datasets and conducted extensive ablation studies on our design choices. Despite training solely on simulated data, our method achieves state-of-the-art performance and can generalize to real-world scenarios with multiple trajectories, opening up a range of applications in sport analysis and virtual replay. Please visit our page: https://where-is-the-ball.github.io.