DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation

๐Ÿ“… 2025-01-28
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing text- or single-image-conditioned 3D Gaussian Splatting (3DGS) generation methods struggle to simultaneously leverage powerful 2D diffusion priors and enforce rigorous 3D geometric consistency. Method: We propose the first native 3DGS generation framework tailored for large-scale text-to-image diffusion models. It features (1) a lightweight multi-view reconstruction module that constructs a differentiable splatting grid, and (2) joint optimization of diffusion prior losses and explicit 3D rendering losses to enforce cross-view geometric and appearance consistency. Contribution/Results: Our approach is the first to directly and seamlessly transfer pre-trained image diffusion models into the 3D Gaussian parameter spaceโ€”bypassing any 2Dโ€“3D intermediate representations. Experiments demonstrate state-of-the-art performance in both text- and image-conditioned 3D generation and downstream tasks, achieving superior fidelity, cross-view consistency, and computational efficiency.

Technology Category

Application Category

๐Ÿ“ Abstract
Recent advancements in 3D content generation from text or a single image struggle with limited high-quality 3D datasets and inconsistency from 2D multi-view generation. We introduce DiffSplat, a novel 3D generative framework that natively generates 3D Gaussian splats by taming large-scale text-to-image diffusion models. It differs from previous 3D generative models by effectively utilizing web-scale 2D priors while maintaining 3D consistency in a unified model. To bootstrap the training, a lightweight reconstruction model is proposed to instantly produce multi-view Gaussian splat grids for scalable dataset curation. In conjunction with the regular diffusion loss on these grids, a 3D rendering loss is introduced to facilitate 3D coherence across arbitrary views. The compatibility with image diffusion models enables seamless adaptions of numerous techniques for image generation to the 3D realm. Extensive experiments reveal the superiority of DiffSplat in text- and image-conditioned generation tasks and downstream applications. Thorough ablation studies validate the efficacy of each critical design choice and provide insights into the underlying mechanism.
Problem

Research questions and friction points this paper is trying to address.

3D Image Scarcity
Consistency of 2D Images
Text-to-3D Generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

DiffSplat
Angle-consistent 3D Image Generation
Quality and Consistency Loss Function
๐Ÿ”Ž Similar Papers
No similar papers found.