π€ AI Summary
Accurate camera pose annotation for large-scale dynamic internet videos remains challenging due to pervasive motion blur, dynamic object interference, and lack of precise calibration.
Method: We introduce DynPose-100Kβthe first large-scale dataset comprising 100,000 real-world dynamic video sequences with ground-truth camera poses. To construct it, we propose a novel video curation pipeline integrating task-specific models with general foundation models. Our end-to-end pose estimation framework jointly incorporates dynamic object masking, optical flow-guided point tracking, and robust structure-from-motion (SfM) via bundle adjustment. Additionally, we employ multi-model collaborative filtering and temporal consistency modeling to enhance robustness.
Results: Extensive experiments demonstrate that our framework significantly outperforms existing methods in pose accuracy and cross-scene generalization. DynPose-100K provides high-fidelity, scalable pose supervision, enabling advancements in downstream applications such as photorealistic video generation and physics-based simulation.
π Abstract
Annotating camera poses on dynamic Internet videos at scale is critical for advancing fields like realistic video generation and simulation. However, collecting such a dataset is difficult, as most Internet videos are unsuitable for pose estimation. Furthermore, annotating dynamic Internet videos present significant challenges even for state-of-theart methods. In this paper, we introduce DynPose-100K, a large-scale dataset of dynamic Internet videos annotated with camera poses. Our collection pipeline addresses filtering using a carefully combined set of task-specific and generalist models. For pose estimation, we combine the latest techniques of point tracking, dynamic masking, and structure-from-motion to achieve improvements over the state-of-the-art approaches. Our analysis and experiments demonstrate that DynPose-100K is both large-scale and diverse across several key attributes, opening up avenues for advancements in various downstream applications.