PAD3R: Pose-Aware Dynamic 3D Reconstruction from Casual Videos

📅 2025-09-29

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work addresses the challenging problem of dynamic 3D reconstruction of deformable objects from unstructured, monocular videos without camera pose annotations—particularly under severe non-rigid deformation, large-scale camera motion, and sparse viewpoint coverage, where conventional methods fail. We propose the first pose-agnostic, category-agnostic framework for articulated 3D reconstruction. Our method integrates generative 3D priors with differentiable rendering and introduces an object-centric, personalized pose estimator. It jointly optimizes via pre-trained image-to-3D supervision, long-term 2D point trajectory regularization, and deformable 3D Gaussian optimization. Extensive evaluation across diverse dynamic scenes demonstrates strong robustness and generalization. Both qualitative and quantitative results significantly outperform state-of-the-art approaches, establishing new benchmarks for articulated reconstruction from unposed monocular video.

Technology Category

Application Category

📝 Abstract

We present PAD3R, a method for reconstructing deformable 3D objects from casually captured, unposed monocular videos. Unlike existing approaches, PAD3R handles long video sequences featuring substantial object deformation, large-scale camera movement, and limited view coverage that typically challenge conventional systems. At its core, our approach trains a personalized, object-centric pose estimator, supervised by a pre-trained image-to-3D model. This guides the optimization of deformable 3D Gaussian representation. The optimization is further regularized by long-term 2D point tracking over the entire input video. By combining generative priors and differentiable rendering, PAD3R reconstructs high-fidelity, articulated 3D representations of objects in a category-agnostic way. Extensive qualitative and quantitative results show that PAD3R is robust and generalizes well across challenging scenarios, highlighting its potential for dynamic scene understanding and 3D content creation.

Problem

Research questions and friction points this paper is trying to address.

Reconstructing deformable 3D objects from unposed monocular videos

Handling large object deformation and camera movement in videos

Achieving category-agnostic 3D reconstruction with limited view coverage

Innovation

Methods, ideas, or system contributions that make the work stand out.

Trains personalized pose estimator for objects

Optimizes deformable 3D Gaussian representation

Uses generative priors and differentiable rendering

🔎 Similar Papers

MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion