LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

๐Ÿ“… 2024-03-18
๐Ÿ›๏ธ European Conference on Computer Vision
๐Ÿ“ˆ Citations: 70
โœจ Influential: 4
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Current 3D diffusion models suffer from scarcity and struggle to balance generation quality and efficiency. To address this, we propose the first end-to-end 3D diffusion generation framework: a 3D-aware Variational Autoencoder (3D-Aware VAE) maps input images into a compact, structured 3D latent space, and a Transformer-based decoder drives diffusion directly within this latent space to synthesize high-fidelity Neural Radiance Fields (NeRFs). Crucially, our method is the first to train a diffusion model explicitly in an interpretable 3D latent spaceโ€”eliminating per-instance optimization and enabling millisecond-scale inference. On ShapeNet, it achieves state-of-the-art 3D generation performance; it also sets new benchmarks in monocular 3D reconstruction and conditional 3D generation. Compared to existing 3D diffusion approaches, our framework delivers superior inference speed and generalization capability.

Technology Category

Application Category

๐Ÿ“ Abstract
The field of neural rendering has witnessed significant progress with advancements in generative models and differentiable rendering techniques. Though 2D diffusion has achieved success, a unified 3D diffusion pipeline remains unsettled. This paper introduces a novel framework called LN3Diff to address this gap and enable fast, high-quality, and generic conditional 3D generation. Our approach harnesses a 3D-aware architecture and variational autoencoder (VAE) to encode the input image into a structured, compact, and 3D latent space. The latent is decoded by a transformer-based decoder into a high-capacity 3D neural field. Through training a diffusion model on this 3D-aware latent space, our method achieves state-of-the-art performance on ShapeNet for 3D generation and demonstrates superior performance in monocular 3D reconstruction and conditional 3D generation across various datasets. Moreover, it surpasses existing 3D diffusion methods in terms of inference speed, requiring no per-instance optimization. Our proposed LN3Diff presents a significant advancement in 3D generative modeling and holds promise for various applications in 3D vision and graphics tasks.
Problem

Research questions and friction points this paper is trying to address.

Enables fast, high-quality conditional 3D generation
Addresses the lack of a unified 3D diffusion pipeline
Surpasses existing methods in inference speed without optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses 3D-aware VAE for compact latent encoding
Employs transformer decoder for 3D neural field generation
Trains diffusion model on latent space for fast inference
๐Ÿ”Ž Similar Papers
2024-09-12arXiv.orgCitations: 9