🤖 AI Summary
This work addresses the challenge of high-quality articulated part reconstruction, segmentation, and kinematic analysis under conditions where the number of parts is unknown a priori and object visibility is limited. The authors propose a dynamic-static decoupling framework that requires no structural priors, leveraging user interaction videos together with an initial static scan to automatically infer the number of parts and assign joints. By introducing a dual-Gaussian scene representation, the method enables high-fidelity rendering and motion-aware segmentation. Furthermore, it integrates motion cues, sequential RANSAC clustering, and kinematic estimation to achieve end-to-end part parsing. Experiments demonstrate that the approach significantly outperforms existing methods on both simple and complex objects, exhibiting strong generalization and robustness.
📝 Abstract
Articulated objects are ubiquitous in daily life. Our goal is to achieve a high-quality reconstruction, segmentation of independent moving parts, and analysis of articulation. Recent methods analyse two different articulation states and perform per-point part segmentation, optimising per-part articulation using cross-state correspondences, given a priori knowledge of the number of parts. Such assumptions greatly limit their applications and performance. Their robustness is reduced when objects cannot be clearly visible in both states. To address these issues, in this paper, we present a new framework, Articulation in Motion (AiM). We infer part-level decomposition, articulation kinematics, and reconstruct an interactive 3D digital replica from a user-object interaction video and a start-state scan. We propose a dual-Gaussian scene representation that is learned from an initial 3DGS scan of the object and a video that shows the movement of separate parts. It uses motion cues to segment the object into parts and assign articulation joints. Subsequently, a robust, sequential RANSAC is employed to achieve part mobility analysis without any part-level structural priors, which clusters moving primitives into rigid parts and estimates kinematics while automatically determining the number of parts. The proposed approach separates the object into parts, each represented as a 3D Gaussian set, enabling high-quality rendering. Our approach yields higher quality part segmentation than previous methods, without prior knowledge. Extensive experimental analysis on both simple and complex objects validates the effectiveness and strong generalisation ability of our approach. Project page: https://haoai-1997.github.io/AiM/.