🤖 AI Summary
Existing 4D generation methods predominantly rely on implicit deformation fields, which hinder intuitive editing of dynamic 3D content. This work proposes an editable 4D generation framework based on Gaussian skeletalization that constructs a hierarchical articulated representation from monocular videos, decoupling motion into skeleton-driven rigid transformations and non-rigid details modeled via hexplane representations. By introducing, for the first time, an explicit skeleton-driven hierarchical motion representation—integrating linear blend skinning, dynamic 3D Gaussians, and hexplane techniques—the method achieves superior generation quality compared to existing approaches while significantly enhancing interpretability and editing flexibility. This advancement establishes a new paradigm for editable 4D content generation.
📝 Abstract
4D generation has made remarkable progress in synthesizing dynamic 3D objects from input text, images, or videos. However, existing methods often represent motion as an implicit deformation field, which limits direct control and editability. To address this issue, we propose SkeletonGaussian, a novel framework for generating editable dynamic 3D Gaussians from monocular video input. Our approach introduces a hierarchical articulated representation that decomposes motion into sparse rigid motion explicitly driven by a skeleton and fine-grained non-rigid motion. Concretely, we extract a robust skeleton and drive rigid motion via linear blend skinning, followed by a hexplane-based refinement for non-rigid deformations, enhancing interpretability and editability. Experimental results demonstrate that SkeletonGaussian surpasses existing methods in generation quality while enabling intuitive motion editing, establishing a new paradigm for editable 4D generation. Project page: https://wusar.github.io/projects/skeletongaussian/