🤖 AI Summary
Existing methods lack an end-to-end solution for inferring complete articulation structures—including 3D movable parts, kinematic topology, and motion constraints—directly from a single static 3D mesh.
Method: We propose the first end-to-end feed-forward framework, Part Articulation Transformer (PAT), which operates directly on point cloud representations of static meshes and is trained end-to-end on a large-scale articulated 3D dataset; PAT natively supports multi-joint modeling and is the first method adapted to AI-generated 3D assets.
Contribution/Results: We introduce a human-centered benchmark and evaluation protocol tailored to articulated structure inference. Experiments demonstrate that PAT significantly outperforms prior work in accuracy, generalization across diverse object categories, and inference speed (sub-second latency), enabling a fully automatic pipeline: “single image → 3D generation → articulation structure extraction.”
📝 Abstract
We present Particulate, a feed-forward approach that, given a single static 3D mesh of an everyday object, directly infers all attributes of the underlying articulated structure, including its 3D parts, kinematic structure, and motion constraints. At its core is a transformer network, Part Articulation Transformer, which processes a point cloud of the input mesh using a flexible and scalable architecture to predict all the aforementioned attributes with native multi-joint support. We train the network end-to-end on a diverse collection of articulated 3D assets from public datasets. During inference, Particulate lifts the network's feed-forward prediction to the input mesh, yielding a fully articulated 3D model in seconds, much faster than prior approaches that require per-object optimization. Particulate can also accurately infer the articulated structure of AI-generated 3D assets, enabling full-fledged extraction of articulated 3D objects from a single (real or synthetic) image when combined with an off-the-shelf image-to-3D generator. We further introduce a new challenging benchmark for 3D articulation estimation curated from high-quality public 3D assets, and redesign the evaluation protocol to be more consistent with human preferences. Quantitative and qualitative results show that Particulate significantly outperforms state-of-the-art approaches.