π€ AI Summary
This paper addresses the highly ill-posed problem of 4D reconstruction of dynamic scenes from unconstrained monocular videoβi.e., recovering temporally coherent 3D geometry with both structural completeness and motion consistency. We propose the first template-free, static-assumption-free explicit SE(3) motion modeling framework. Our key contributions are: (1) an SE(3) motion basis representation enabling soft rigid-body segmentation and long-range motion decoupling; (2) a globally consistent joint optimization over monocular depth, multi-scale optical flow, long-term 2D trajectories, and depth priors; and (3) integration of differentiable rendering with trajectory constraints to enhance geometric and motion fidelity. Our method achieves state-of-the-art performance on long-term 3D/2D motion estimation and novel-view synthesis, significantly improving motion continuity and reconstruction accuracy.
π Abstract
Monocular dynamic reconstruction is a challenging and long-standing vision problem due to the highly ill-posed nature of the task. Existing approaches are limited in that they either depend on templates, are effective only in quasi-static scenes, or fail to model 3D motion explicitly. In this work, we introduce a method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion, from casually captured monocular videos. We tackle the under-constrained nature of the problem with two key insights: First, we exploit the low-dimensional structure of 3D motion by representing scene motion with a compact set of SE3 motion bases. Each point's motion is expressed as a linear combination of these bases, facilitating soft decomposition of the scene into multiple rigidly-moving groups. Second, we utilize a comprehensive set of data-driven priors, including monocular depth maps and long-range 2D tracks, and devise a method to effectively consolidate these noisy supervisory signals, resulting in a globally consistent representation of the dynamic scene. Experiments show that our method achieves state-of-the-art performance for both long-range 3D/2D motion estimation and novel view synthesis on dynamic scenes. Project Page: https://shape-of-motion.github.io/