MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh

📅 2025-08-02

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

Existing methods are constrained by large language models’ (LLMs) token-length limits, hindering the construction of large-scale text-to-3D mesh datasets; moreover, conventional mesh serialization often discards critical 3D topological and spatial structural information. To address this, we propose Primitive-Mesh decomposition—a novel strategy enabling the first creation of a high-quality text-mesh paired dataset comprising over 1.5 million samples. We further introduce a training paradigm centered on face connectivity reasoning and local mesh assembly, explicitly modeling vertex-face topological relationships and local geometric structure. Our framework significantly enhances LLMs’ understanding and generation of textualized 3D meshes—without increasing model parameters—achieving state-of-the-art performance in mesh reconstruction fidelity and shape comprehension, outperforming baselines such as LLaMA-Mesh. This work establishes a scalable, structure-aware paradigm for language-driven 3D generation.

Technology Category

Application Category

📝 Abstract

We present MeshLLM, a novel framework that leverages large language models (LLMs) to understand and generate text-serialized 3D meshes. Our approach addresses key limitations in existing methods, including the limited dataset scale when catering to LLMs' token length and the loss of 3D structural information during mesh serialization. We introduce a Primitive-Mesh decomposition strategy, which divides 3D meshes into structurally meaningful subunits. This enables the creation of a large-scale dataset with 1500k+ samples, almost 50 times larger than previous methods, which aligns better with the LLM scaling law principles. Furthermore, we propose inferring face connectivity from vertices and local mesh assembly training strategies, significantly enhancing the LLMs' ability to capture mesh topology and spatial structures. Experiments show that MeshLLM outperforms the state-of-the-art LLaMA-Mesh in both mesh generation quality and shape understanding, highlighting its great potential in processing text-serialized 3D meshes.

Problem

Research questions and friction points this paper is trying to address.

Enhance LLMs' ability to understand and generate 3D meshes

Overcome dataset scale and structural loss in mesh serialization

Improve mesh topology and spatial structure capture by LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Primitive-Mesh decomposition for meaningful subunits

Large-scale dataset creation with 1500k+ samples

Face connectivity inference and local mesh assembly

🔎 Similar Papers

When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models