🤖 AI Summary
This paper addresses the problem of joint motion prediction for single-frame static 3D shapes—i.e., inferring kinematic joint constraints between parts without any human-annotated joint labels. The proposed method introduces a geometry-driven self-supervised pretraining paradigm: leveraging physical validity (interpenetration-free and collision-free motion) as a geometric constraint, it designs a geometric search strategy to enable zero-shot discovery and cross-shape generalization of joint constraints. A Transformer-based architecture jointly models part-wise geometric relationships, motion feasibility constraints, and self-supervised contrastive learning. Evaluated on the PartNet-Mobility dataset, the approach achieves state-of-the-art performance, significantly improving both physical plausibility and generalization capability of predicted joint constraints. To the best of our knowledge, this is the first end-to-end framework capable of discovering kinematic joint constraints in fully unlabeled 3D shapes.
📝 Abstract
We present GEOPARD, a transformer-based architecture for predicting articulation from a single static snapshot of a 3D shape. The key idea of our method is a pretraining strategy that allows our transformer to learn plausible candidate articulations for 3D shapes based on a geometric-driven search without manual articulation annotation. The search automatically discovers physically valid part motions that do not cause detachments or collisions with other shape parts. Our experiments indicate that this geometric pretraining strategy, along with carefully designed choices in our transformer architecture, yields state-of-the-art results in articulation inference in the PartNet-Mobility dataset.