🤖 AI Summary
Existing methods for text-driven 3D articulated object generation struggle to simultaneously achieve structural flexibility and high geometric fidelity, often being constrained by predefined topologies or static dataset retrieval. This paper introduces the first text-conditioned controllable 3D articulated object generation framework: it models articulated objects as hierarchical token sequences, jointly generating kinematic topology and signed distance function (SDF)-based implicit geometry. For the first time, our unified architecture co-optimizes structural variability—supporting arbitrary numbers of parts—and SDF-based geometric fidelity. Leveraging a Transformer backbone with tree-structured tokenization, we realize end-to-end generation from text to articulated structure to high-quality geometry. Experiments demonstrate significant improvements over prior work in FID, Chamfer distance, and motion plausibility metrics, enabling diverse, high-fidelity, and kinematically valid 3D articulated object synthesis.
📝 Abstract
This paper presents a novel framework for modeling and conditional generation of 3D articulated objects. Troubled by flexibility-quality tradeoffs, existing methods are often limited to using predefined structures or retrieving shapes from static datasets. To address these challenges, we parameterize an articulated object as a tree of tokens and employ a transformer to generate both the object's high-level geometry code and its kinematic relations. Subsequently, each sub-part's geometry is further decoded using a signed-distance-function (SDF) shape prior, facilitating the synthesis of high-quality 3D shapes. Our approach enables the generation of diverse objects with high-quality geometry and varying number of parts. Comprehensive experiments on conditional generation from text descriptions demonstrate the effectiveness and flexibility of our method.