Dynamic Camera Poses and Where to Find Them

πŸ“… 2025-04-24
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Accurate camera pose annotation for large-scale dynamic internet videos remains challenging due to pervasive motion blur, dynamic object interference, and lack of precise calibration. Method: We introduce DynPose-100Kβ€”the first large-scale dataset comprising 100,000 real-world dynamic video sequences with ground-truth camera poses. To construct it, we propose a novel video curation pipeline integrating task-specific models with general foundation models. Our end-to-end pose estimation framework jointly incorporates dynamic object masking, optical flow-guided point tracking, and robust structure-from-motion (SfM) via bundle adjustment. Additionally, we employ multi-model collaborative filtering and temporal consistency modeling to enhance robustness. Results: Extensive experiments demonstrate that our framework significantly outperforms existing methods in pose accuracy and cross-scene generalization. DynPose-100K provides high-fidelity, scalable pose supervision, enabling advancements in downstream applications such as photorealistic video generation and physics-based simulation.

Technology Category

Application Category

πŸ“ Abstract
Annotating camera poses on dynamic Internet videos at scale is critical for advancing fields like realistic video generation and simulation. However, collecting such a dataset is difficult, as most Internet videos are unsuitable for pose estimation. Furthermore, annotating dynamic Internet videos present significant challenges even for state-of-theart methods. In this paper, we introduce DynPose-100K, a large-scale dataset of dynamic Internet videos annotated with camera poses. Our collection pipeline addresses filtering using a carefully combined set of task-specific and generalist models. For pose estimation, we combine the latest techniques of point tracking, dynamic masking, and structure-from-motion to achieve improvements over the state-of-the-art approaches. Our analysis and experiments demonstrate that DynPose-100K is both large-scale and diverse across several key attributes, opening up avenues for advancements in various downstream applications.
Problem

Research questions and friction points this paper is trying to address.

Annotating camera poses in dynamic Internet videos at scale
Filtering unsuitable videos for pose estimation effectively
Improving pose estimation accuracy with advanced techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines task-specific and generalist models for filtering
Uses point tracking and dynamic masking for pose estimation
Applies structure-from-motion to improve accuracy
πŸ”Ž Similar Papers
No similar papers found.
C
C. Rockwell
University of Michigan
Joseph Tung
Joseph Tung
PhD Student, New York University
Computer Vision
Tsung-Yi Lin
Tsung-Yi Lin
Research Scientist, NVIDIA
Computer VisionMachine Learning
M
Ming-Yu Liu
NVIDIA
D
D. Fouhey
New York University
C
Chen-Hsuan Lin
NVIDIA