ArtLLM: Generating Articulated Assets via 3D LLM

📅 2026-03-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods for generating articulated 3D assets with joint structures suffer from inefficiency, poor generalization, and geometric redundancy, hindering the creation of high-quality interactive digital environments. This work proposes ArtLLM, a novel framework that leverages a 3D multimodal large language model to autoregressively predict a variable number of parts and their associated joints directly from a complete mesh, while jointly inferring the underlying kinematic structure to guide a conditional 3D generative model toward high-fidelity geometry synthesis. By unifying the modeling of part layout and motion relationships, ArtLLM transcends the limitations of conventional optimization- or retrieval-based paradigms. Evaluated on PartNet-Mobility, the method significantly outperforms existing approaches in both part layout and joint prediction accuracy, and demonstrates successful applications in real-world object digital twins and scalable robot learning.

Technology Category

Application Category

📝 Abstract
Creating interactive digital environments for gaming, robotics, and simulation relies on articulated 3D objects whose functionality emerges from their part geometry and kinematic structure. However, existing approaches remain fundamentally limited: optimization-based reconstruction methods require slow, per-object joint fitting and typically handle only simple, single-joint objects, while retrieval-based methods assemble parts from a fixed library, leading to repetitive geometry and poor generalization. To address these challenges, we introduce ArtLLM, a novel framework for generating high-quality articulated assets directly from complete 3D meshes. At its core is a 3D multimodal large language model trained on a large-scale articulation dataset curated from both existing articulation datasets and procedurally generated objects. Unlike prior work, ArtLLM autoregressively predicts a variable number of parts and joints, inferring their kinematic structure in a unified manner from the object's point cloud. This articulation-aware layout then conditions a 3D generative model to synthesize high-fidelity part geometries. Experiments on the PartNet-Mobility dataset show that ArtLLM significantly outperforms state-of-the-art methods in both part layout accuracy and joint prediction, while generalizing robustly to real-world objects. Finally, we demonstrate its utility in constructing digital twins, highlighting its potential for scalable robot learning.
Problem

Research questions and friction points this paper is trying to address.

articulated 3D objects
3D asset generation
kinematic structure
part geometry
digital twins
Innovation

Methods, ideas, or system contributions that make the work stand out.

Articulated 3D Generation
3D Multimodal LLM
Kinematic Structure Prediction
Autoregressive Part-Joint Modeling
Digital Twin Construction
🔎 Similar Papers
No similar papers found.
P
Penghao Wang
ShanghaiTech University; Tencent Hunyuan
S
Siyuan Xie
ShanghaiTech University
H
Hongyu Yan
Tencent Hunyuan; HKUST
X
Xianghui Yang
Tencent Hunyuan
Jingwei Huang
Jingwei Huang
Principal Research Scientist, Tencent
Computer GraphicsComputer VisionGeometry Processing
C
Chunchao Guo
Tencent Hunyuan
Jiayuan Gu
Jiayuan Gu
Assistant Professor, ShanghaiTech University
Embodied AI3D Vision