🤖 AI Summary
Existing 3D generative methods suffer from slow optimization, irregular topology, noisy surfaces, and limited editability. To address these challenges, we propose the first 3D-native diffusion framework supporting both text and image conditioning, enabling second-level generation of topologically regular coarse meshes. Our method introduces multi-view joint conditional modeling and latent-space set representation to enhance geometric consistency across views. Furthermore, we pioneer a normal-field-driven geometric refinement mechanism that enables automated detail enhancement and intuitive, interactive user editing. Extensive experiments demonstrate that our approach significantly outperforms state-of-the-art methods in both qualitative and quantitative evaluations—producing high-fidelity, geometrically coherent meshes with rich surface details and flexible, real-time editability. The code and pretrained models are publicly released and seamlessly integrated into practical 3D modeling workflows.
📝 Abstract
We present a novel generative 3D modeling system, coined CraftsMan, which can generate high-fidelity 3D geometries with highly varied shapes, regular mesh topologies, and detailed surfaces, and, notably, allows for refining the geometry in an interactive manner. Despite the significant advancements in 3D generation, existing methods still struggle with lengthy optimization processes, irregular mesh topologies, noisy surfaces, and difficulties in accommodating user edits, consequently impeding their widespread adoption and implementation in 3D modeling software. Our work is inspired by the craftsman, who usually roughs out the holistic figure of the work first and elaborates the surface details subsequently. Specifically, we employ a 3D native diffusion model, which operates on latent space learned from latent set-based 3D representations, to generate coarse geometries with regular mesh topology in seconds. In particular, this process takes as input a text prompt or a reference image and leverages a powerful multi-view (MV) diffusion model to generate multiple views of the coarse geometry, which are fed into our MV-conditioned 3D diffusion model for generating the 3D geometry, significantly improving robustness and generalizability. Following that, a normal-based geometry refiner is used to significantly enhance the surface details. This refinement can be performed automatically, or interactively with user-supplied edits. Extensive experiments demonstrate that our method achieves high efficacy in producing superior-quality 3D assets compared to existing methods. HomePage: https://craftsman3d.github.io/, Code: https://github.com/wyysf-98/CraftsMan