LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

📅 2024-03-18

🏛️ European Conference on Computer Vision

📈 Citations: 70

✨ Influential: 4

career value

207K/year

🤖 AI Summary

Current 3D diffusion models suffer from scarcity and struggle to balance generation quality and efficiency. To address this, we propose the first end-to-end 3D diffusion generation framework: a 3D-aware Variational Autoencoder (3D-Aware VAE) maps input images into a compact, structured 3D latent space, and a Transformer-based decoder drives diffusion directly within this latent space to synthesize high-fidelity Neural Radiance Fields (NeRFs). Crucially, our method is the first to train a diffusion model explicitly in an interpretable 3D latent space—eliminating per-instance optimization and enabling millisecond-scale inference. On ShapeNet, it achieves state-of-the-art 3D generation performance; it also sets new benchmarks in monocular 3D reconstruction and conditional 3D generation. Compared to existing 3D diffusion approaches, our framework delivers superior inference speed and generalization capability.

Technology Category

Application Category

📝 Abstract

The field of neural rendering has witnessed significant progress with advancements in generative models and differentiable rendering techniques. Though 2D diffusion has achieved success, a unified 3D diffusion pipeline remains unsettled. This paper introduces a novel framework called LN3Diff to address this gap and enable fast, high-quality, and generic conditional 3D generation. Our approach harnesses a 3D-aware architecture and variational autoencoder (VAE) to encode the input image into a structured, compact, and 3D latent space. The latent is decoded by a transformer-based decoder into a high-capacity 3D neural field. Through training a diffusion model on this 3D-aware latent space, our method achieves state-of-the-art performance on ShapeNet for 3D generation and demonstrates superior performance in monocular 3D reconstruction and conditional 3D generation across various datasets. Moreover, it surpasses existing 3D diffusion methods in terms of inference speed, requiring no per-instance optimization. Our proposed LN3Diff presents a significant advancement in 3D generative modeling and holds promise for various applications in 3D vision and graphics tasks.

Problem

Research questions and friction points this paper is trying to address.

Enables fast, high-quality conditional 3D generation

Addresses the lack of a unified 3D diffusion pipeline

Surpasses existing methods in inference speed without optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses 3D-aware VAE for compact latent encoding

Employs transformer decoder for 3D neural field generation

Trains diffusion model on latent space for fast inference

🔎 Similar Papers

LT3SD: Latent Trees for 3D Scene Diffusion