🤖 AI Summary
Reconstructing the geometry and articulation parameters of articulated objects from sparse, uncalibrated multi-state views is a highly ill-posed problem. This work proposes the first feed-forward articulated 3D Gaussian splatting framework that simultaneously recovers both geometric structure and joint configurations in a single forward pass. The method introduces pixel-wise joint maps to jointly optimize articulation parameters and incorporates a state-tagged cross-state attention mechanism to enable efficient, end-to-end articulated reconstruction. Evaluation on the PartNet-Mobility dataset demonstrates that the approach achieves competitive accuracy in both geometry and joint estimation while offering a speedup of over 400× in inference time compared to existing baselines.
📝 Abstract
Articulated object reconstruction from sparse-view images is an ill-posed problem that requires simultaneous inference of geometry and underlying articulation structure. Existing methods for articulated object reconstruction based on NeRF and 3D Gaussian Splatting (3DGS) typically rely on dense views or strong priors (e.g., depth maps, joint types, predefined number of joints) and require costly per-object optimization. In this paper, we propose ArtSplat, the first feed-forward framework for articulated 3D Gaussian Splatting. It reconstructs both geometry and joint parameters from sparse multi-view images across multiple articulation states in a single forward pass. To address the challenges of single-pass articulated reconstruction, we introduce a per-pixel joint map representation that enables the integration of joint parameter estimation into the feed-forward pipeline. We further propose a Cross-State Attention (CSA) mechanism with state tokens, which effectively captures discrete motion across input states. Experiments on 68 articulated objects from PartNet-Mobility, including both single- and multi-joint configurations, demonstrate that ArtSplat achieves competitive performance in both geometry and joint estimation, while being over 400 times faster than baselines.