🤖 AI Summary
Joint geometric and motion reconstruction of articulated objects is a critical challenge in digital twin construction. Existing approaches predominantly adopt a decoupled paradigm—reconstructing geometry first, then aligning motion—which leads to complex pipelines, poor generalization, and limited scalability to multi-part articulated systems. This paper introduces MPArt, a unified differentiable 3D Gaussian representation framework that jointly embeds geometry and articulation within a shared parametric space, enabling end-to-end optimization. MPArt comprises a part-aware motion decomposition network and a differentiable rendering-based articulated Gaussian modeling module. Evaluated on the newly introduced MPArt-90 multi-part benchmark, it demonstrates strong generalization to articulated systems with up to 20 parts and achieves state-of-the-art accuracy in both part-level geometry reconstruction and motion estimation. The method directly enables downstream applications including robotic simulation and human-scene interaction modeling.
📝 Abstract
Reconstructing articulated objects is essential for building digital twins of interactive environments. However, prior methods typically decouple geometry and motion by first reconstructing object shape in distinct states and then estimating articulation through post-hoc alignment. This separation complicates the reconstruction pipeline and restricts scalability, especially for objects with complex, multi-part articulation. We introduce a unified representation that jointly models geometry and motion using articulated 3D Gaussians. This formulation improves robustness in motion decomposition and supports articulated objects with up to 20 parts, significantly outperforming prior approaches that often struggle beyond 2--3 parts due to brittle initialization. To systematically assess scalability and generalization, we propose MPArt-90, a new benchmark consisting of 90 articulated objects across 20 categories, each with diverse part counts and motion configurations. Extensive experiments show that our method consistently achieves superior accuracy in part-level geometry reconstruction and motion estimation across a broad range of object types. We further demonstrate applicability to downstream tasks such as robotic simulation and human-scene interaction modeling, highlighting the potential of unified articulated representations in scalable physical modeling.