๐ค AI Summary
To address the high cost, labor-intensive modeling, and poor scalability of articulated asset creation in robotic simulation environments, this paper proposes Arti4URDFโa novel method that enables end-to-end automatic generation of URDF models directly executable in simulation from static 3D scenes (e.g., point clouds). Arti4URDF integrates point-cloud geometric analysis, large language models (LLMs) encoding physical commonsense priors, and URDF-structured prompt engineering to support text-guided identification of movable parts, kinematic chain inference, and geometry-preserving joint constraint modeling. Evaluated on both synthetic and real-world scanned scenes, it achieves state-of-the-art performance, producing URDF models with high geometric fidelity and out-of-the-box simulation compatibility. The approach significantly lowers the barrier to constructing interactive robotic simulation environments.
๐ Abstract
Building interactive simulators and scalable robot-learning environments requires a large number of articulated assets. However, most existing 3D assets in simulation are rigid, and manually converting them into articulated objects is extremely labor- and cost-intensive. This raises a natural question: can we automatically identify articulable objects in a scene and convert them into articulated assets directly? In this paper, we present ArtiWorld, a scene-aware pipeline that localizes candidate articulable objects from textual scene descriptions and reconstructs executable URDF models that preserve the original geometry. At the core of this pipeline is Arti4URDF, which leverages 3D point cloud, prior knowledge of a large language model (LLM), and a URDF-oriented prompt design to rapidly convert rigid objects into interactive URDF-based articulated objects while maintaining their 3D shape. We evaluate ArtiWorld at three levels: 3D simulated objects, full 3D simulated scenes, and real-world scan scenes. Across all three settings, our method consistently outperforms existing approaches and achieves state-of-the-art performance, while preserving object geometry and correctly capturing object interactivity to produce usable URDF-based articulated models. This provides a practical path toward building interactive, robot-ready simulation environments directly from existing 3D assets. Code and data will be released.