Dynamic Camera Poses and Where to Find Them

📅 2025-04-24

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Accurate camera pose annotation for large-scale dynamic internet videos remains challenging due to pervasive motion blur, dynamic object interference, and lack of precise calibration. Method: We introduce DynPose-100K—the first large-scale dataset comprising 100,000 real-world dynamic video sequences with ground-truth camera poses. To construct it, we propose a novel video curation pipeline integrating task-specific models with general foundation models. Our end-to-end pose estimation framework jointly incorporates dynamic object masking, optical flow-guided point tracking, and robust structure-from-motion (SfM) via bundle adjustment. Additionally, we employ multi-model collaborative filtering and temporal consistency modeling to enhance robustness. Results: Extensive experiments demonstrate that our framework significantly outperforms existing methods in pose accuracy and cross-scene generalization. DynPose-100K provides high-fidelity, scalable pose supervision, enabling advancements in downstream applications such as photorealistic video generation and physics-based simulation.

Technology Category

Application Category

📝 Abstract

Annotating camera poses on dynamic Internet videos at scale is critical for advancing fields like realistic video generation and simulation. However, collecting such a dataset is difficult, as most Internet videos are unsuitable for pose estimation. Furthermore, annotating dynamic Internet videos present significant challenges even for state-of-theart methods. In this paper, we introduce DynPose-100K, a large-scale dataset of dynamic Internet videos annotated with camera poses. Our collection pipeline addresses filtering using a carefully combined set of task-specific and generalist models. For pose estimation, we combine the latest techniques of point tracking, dynamic masking, and structure-from-motion to achieve improvements over the state-of-the-art approaches. Our analysis and experiments demonstrate that DynPose-100K is both large-scale and diverse across several key attributes, opening up avenues for advancements in various downstream applications.

Problem

Research questions and friction points this paper is trying to address.

Annotating camera poses in dynamic Internet videos at scale

Filtering unsuitable videos for pose estimation effectively

Improving pose estimation accuracy with advanced techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines task-specific and generalist models for filtering

Uses point tracking and dynamic masking for pose estimation

Applies structure-from-motion to improve accuracy

🔎 Similar Papers

No similar papers found.

TikTok

San Jose, California

Research Engineer/Scientist (all levels), World Models

TikTok

San Jose, California

AI Research Scientist, Computer Vision - Facebook Video Intelligence