PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers

📅 2025-06-05

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Existing 3D generation methods struggle to jointly synthesize multiple semantically coherent and geometrically independent 3D mesh parts from a single RGB image in an end-to-end manner, often resorting to two-stage segmentation-reconstruction pipelines. This work proposes the first input-segmentation-free framework for joint structured 3D synthesis. It introduces a disentangled compositional latent space to represent individual parts, incorporates hierarchical attention to reconcile global structural consistency with local geometric detail, and integrates compositional latent modeling with hierarchical self-attention within a pre-trained 3D mesh diffusion Transformer (DiT). Trained on a newly constructed part-level annotated dataset, our method significantly outperforms state-of-the-art approaches. Notably, it achieves, for the first time in decomposable 3D generation, plausible synthesis of occluded or unseen parts—those absent in the input image yet consistent with semantic and geometric priors.

Technology Category

Application Category

📝 Abstract

We introduce PartCrafter, the first structured 3D generative model that jointly synthesizes multiple semantically meaningful and geometrically distinct 3D meshes from a single RGB image. Unlike existing methods that either produce monolithic 3D shapes or follow two-stage pipelines, i.e., first segmenting an image and then reconstructing each segment, PartCrafter adopts a unified, compositional generation architecture that does not rely on pre-segmented inputs. Conditioned on a single image, it simultaneously denoises multiple 3D parts, enabling end-to-end part-aware generation of both individual objects and complex multi-object scenes. PartCrafter builds upon a pretrained 3D mesh diffusion transformer (DiT) trained on whole objects, inheriting the pretrained weights, encoder, and decoder, and introduces two key innovations: (1) A compositional latent space, where each 3D part is represented by a set of disentangled latent tokens; (2) A hierarchical attention mechanism that enables structured information flow both within individual parts and across all parts, ensuring global coherence while preserving part-level detail during generation. To support part-level supervision, we curate a new dataset by mining part-level annotations from large-scale 3D object datasets. Experiments show that PartCrafter outperforms existing approaches in generating decomposable 3D meshes, including parts that are not directly visible in input images, demonstrating the strength of part-aware generative priors for 3D understanding and synthesis. Code and training data will be released.

Problem

Research questions and friction points this paper is trying to address.

Generates multiple 3D meshes from single RGB image

Unifies part-aware generation without pre-segmented inputs

Enhances 3D understanding via compositional latent space

Innovation

Methods, ideas, or system contributions that make the work stand out.

Compositional latent space for 3D parts

Hierarchical attention for structured generation

Joint denoising of multiple 3D parts

🔎 Similar Papers

No similar papers found.