🤖 AI Summary
This work addresses the challenge of achieving geometric consistency in dynamic 3D scene reconstruction from ordinary videos, particularly under extreme viewing angles and complex articulated motion. The authors propose a novel approach that leverages a static pre-scanned model as an explicit prior for both geometry and appearance, constructing a surface-aligned, mesh-based Gaussian primitive representation. To enforce motion coherence, they introduce a CNN-based motion parameterization mechanism that implicitly regularizes the motion of neighboring points. This formulation significantly enhances geometric consistency and robustness in dynamic reconstruction, outperforming state-of-the-art methods in both rendering quality and 3D tracking accuracy—especially in scenarios involving extreme viewpoints and intricate joint movements.
📝 Abstract
Dynamic scene reconstruction from casual videos has seen recent remarkable progress. Numerous approaches have attempted to overcome the ill-posedness of the task by distilling priors from 2D foundational models and by imposing hand-crafted regularization on the optimized motion. However, these methods struggle to reconstruct scenes from extreme novel viewpoints, especially when highly articulated motions are present. In this paper, we present DRoPS, a novel approach that leverages a static pre-scan of the dynamic object as an explicit geometric and appearance prior. While existing state-of-the-art methods fail to fully exploit the pre-scan, DRoPS leverages our novel setup to effectively constrain the solution space and ensure geometrical consistency throughout the sequence. The core of our novelty is twofold: first, we establish a grid-structured and surface-aligned model by organizing Gaussian primitives into pixel grids anchored to the object surface. Second, by leveraging the grid structure of our primitives, we parameterize motion using a CNN conditioned on those grids, injecting strong implicit regularization and correlating the motion of nearby points. Extensive experiments demonstrate that our method significantly outperforms the current state of the art in rendering quality and 3D tracking accuracy.