Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation

📅 2025-03-03

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Existing 3D generative models rely heavily on large-scale 3D-annotated data and struggle to leverage knowledge from pre-trained 2D diffusion models. Method: We propose Kiss3DGen—a zero-shot 3D generation framework that requires no 3D supervision. It fine-tunes off-the-shelf 2D diffusion models (e.g., Stable Diffusion) to jointly synthesize multi-view RGB images and corresponding normal maps—termed “3D Bundle Images”—and enforces geometric-texture co-modeling via multi-view consistency and normal-map guidance. The framework then performs end-to-end Poisson surface reconstruction and UV mapping to produce topologically valid, high-fidelity meshes. Contribution/Results: Kiss3DGen introduces a novel lightweight tiled multi-view + normal-map implicit 3D representation, bypassing explicit 3D parameterization. It enables fine-grained editing and quality enhancement while accelerating inference by over 3×. Under zero 3D supervision, it significantly outperforms state-of-the-art text-to-3D methods.

Technology Category

Application Category

📝 Abstract

Diffusion models have achieved great success in generating 2D images. However, the quality and generalizability of 3D content generation remain limited. State-of-the-art methods often require large-scale 3D assets for training, which are challenging to collect. In this work, we introduce Kiss3DGen (Keep It Simple and Straightforward in 3D Generation), an efficient framework for generating, editing, and enhancing 3D objects by repurposing a well-trained 2D image diffusion model for 3D generation. Specifically, we fine-tune a diffusion model to generate ''3D Bundle Image'', a tiled representation composed of multi-view images and their corresponding normal maps. The normal maps are then used to reconstruct a 3D mesh, and the multi-view images provide texture mapping, resulting in a complete 3D model. This simple method effectively transforms the 3D generation problem into a 2D image generation task, maximizing the utilization of knowledge in pretrained diffusion models. Furthermore, we demonstrate that our Kiss3DGen model is compatible with various diffusion model techniques, enabling advanced features such as 3D editing, mesh and texture enhancement, etc. Through extensive experiments, we demonstrate the effectiveness of our approach, showcasing its ability to produce high-quality 3D models efficiently.

Problem

Research questions and friction points this paper is trying to address.

Limited quality and generalizability in 3D content generation.

Challenges in collecting large-scale 3D assets for training.

Efficient 3D generation by repurposing 2D image diffusion models.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Repurposes 2D diffusion models for 3D generation

Generates 3D Bundle Image for mesh reconstruction

Enables 3D editing and texture enhancement

🔎 Similar Papers

No similar papers found.