OctFusion: Octree-based Diffusion Models for 3D Shape Generation

📅 2024-08-27

🏛️ arXiv.org

📈 Citations: 10

✨ Influential: 2

career value

194K/year

🤖 AI Summary

Addressing the challenge of simultaneously achieving diversity, quality, efficiency, and topological correctness in 3D shape generation, this paper introduces OctFusion—the first diffusion model based on octree-structured implicit representations. Methodologically, it employs a multi-scale U-Net with shared weights to avoid cascaded diffusion; integrates an octree-based VAE, hybrid implicit-explicit representation, and an efficient mesh extraction algorithm; and supports multi-condition inputs (e.g., text, sketch, category) and colored field modeling. Experiments demonstrate state-of-the-art performance on ShapeNet and Objaverse. On a single RTX 4090 GPU, OctFusion generates arbitrary-resolution, manifold-preserving, continuous 3D meshes in just 2.5 seconds, while enabling high-fidelity textured mesh generation. The code and pre-trained models are publicly released.

Technology Category

Application Category

📝 Abstract

Diffusion models have emerged as a popular method for 3D generation. However, it is still challenging for diffusion models to efficiently generate diverse and high-quality 3D shapes. In this paper, we introduce OctFusion, which can generate 3D shapes with arbitrary resolutions in 2.5 seconds on a single Nvidia 4090 GPU, and the extracted meshes are guaranteed to be continuous and manifold. The key components of OctFusion are the octree-based latent representation and the accompanying diffusion models. The representation combines the benefits of both implicit neural representations and explicit spatial octrees and is learned with an octree-based variational autoencoder. The proposed diffusion model is a unified multi-scale U-Net that enables weights and computation sharing across different octree levels and avoids the complexity of widely used cascaded diffusion schemes. We verify the effectiveness of OctFusion on the ShapeNet and Objaverse datasets and achieve state-of-the-art performances on shape generation tasks. We demonstrate that OctFusion is extendable and flexible by generating high-quality color fields for textured mesh generation and high-quality 3D shapes conditioned on text prompts, sketches, or category labels. Our code and pre-trained models are available at url{https://github.com/octree-nn/octfusion}.

Problem

Research questions and friction points this paper is trying to address.

Efficiently generate diverse high-quality 3D shapes

Achieve continuous manifold meshes with arbitrary resolutions

Enable text sketch or category-conditioned 3D generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Octree-based latent representation for 3D shapes

Multi-scale U-Net diffusion model sharing weights

Generates continuous manifold meshes in 2.5 seconds

🔎 Similar Papers

LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation