🤖 AI Summary
Existing 3D procedural reconstruction methods are limited by weak expressivity of domain-specific languages and insufficient training data, hindering accurate modeling of complex geometry and topology. This paper introduces the first large-scale paired dataset framework for point cloud-to-executable Blender Python script generation. We first design a high-expressivity subset of the Blender API and synthesize a million-scale object–code paired dataset. Then, we propose a multimodal large language model that enables end-to-end generation of semantically decomposed, editable scripts from raw point clouds. Our method significantly outperforms prior approaches on shape-to-code reconstruction, supporting fine-grained geometric editing and topological restructuring. It establishes, for the first time, a closed-loop generation pipeline from raw perceptual input to programmable, editable 3D content—advancing procedural 3D understanding and controllable 3D generation.
📝 Abstract
Reconstructing 3D objects into editable programs is pivotal for applications like reverse engineering and shape editing. However, existing methods often rely on limited domain-specific languages (DSLs) and small-scale datasets, restricting their ability to model complex geometries and structures. To address these challenges, we introduce MeshCoder, a novel framework that reconstructs complex 3D objects from point clouds into editable Blender Python scripts. We develop a comprehensive set of expressive Blender Python APIs capable of synthesizing intricate geometries. Leveraging these APIs, we construct a large-scale paired object-code dataset, where the code for each object is decomposed into distinct semantic parts. Subsequently, we train a multimodal large language model (LLM) that translates 3D point cloud into executable Blender Python scripts. Our approach not only achieves superior performance in shape-to-code reconstruction tasks but also facilitates intuitive geometric and topological editing through convenient code modifications. Furthermore, our code-based representation enhances the reasoning capabilities of LLMs in 3D shape understanding tasks. Together, these contributions establish MeshCoder as a powerful and flexible solution for programmatic 3D shape reconstruction and understanding.