๐ค AI Summary
This work addresses template-free 3D modeling of articulated objects from monocular video, targeting high-fidelity novel-view synthesis, editability, and controllable animation. We propose a skeleton-guided differentiable 3D Gaussian deformation framework: (i) a novel skeleton-aware node control mechanism that automatically extracts a sparse, semantics-motion-coupled skeletal structure directly from the Gaussian field; and (ii) learnable skinning weights combined with a pose-dependent neural deformation module to jointly optimize geometry and appearance. Our method requires no manual initialization or category-specific priors. Evaluated on diverse articulated object videos, it significantly outperforms existing template-free approaches. It enables real-time pose retargeting, motion transfer, and interactive editingโwhile preserving high-fidelity, photorealistic novel-view rendering.
๐ Abstract
This paper considers the problem of modeling articulated objects captured in 2D videos to enable novel view synthesis, while also being easily editable, drivable, and re-posable. To tackle this challenging problem, we propose RigGS, a new paradigm that leverages 3D Gaussian representation and skeleton-based motion representation to model dynamic objects without utilizing additional template priors. Specifically, we first propose skeleton-aware node-controlled deformation, which deforms a canonical 3D Gaussian representation over time to initialize the modeling process, producing candidate skeleton nodes that are further simplified into a sparse 3D skeleton according to their motion and semantic information. Subsequently, based on the resulting skeleton, we design learnable skin deformations and pose-dependent detailed deformations, thereby easily deforming the 3D Gaussian representation to generate new actions and render further high-quality images from novel views. Extensive experiments demonstrate that our method can generate realistic new actions easily for objects and achieve high-quality rendering.