Complete Gaussian Splats from a Single Image with Denoising Diffusion Models

📅 2025-08-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of unobservable occluded regions and blurry/distorted reconstructions caused by deterministic single-shot regression in single-image 3D scene reconstruction, this paper proposes the first generative modeling framework for completing Gaussian Splatting (GS) from a single image. Methodologically, we design a self-supervised variational autoencoder to learn a latent space of 3D Gaussian point clouds without ground-truth supervision, and train a latent diffusion model on this space to enable diverse, joint geometry-appearance completion of unobserved surfaces. Our contributions are: (1) the first integration of generative priors into GS-based reconstruction, effectively mitigating ambiguity in occlusion modeling under single-view constraints; and (2) support for diverse sampling, significantly improving 360° rendering realism and structural completeness. Experiments demonstrate that our method consistently outperforms state-of-the-art single-image reconstruction approaches in both occlusion completion quality and panoramic rendering fidelity.

Technology Category

Application Category

📝 Abstract
Gaussian splatting typically requires dense observations of the scene and can fail to reconstruct occluded and unobserved areas. We propose a latent diffusion model to reconstruct a complete 3D scene with Gaussian splats, including the occluded parts, from only a single image during inference. Completing the unobserved surfaces of a scene is challenging due to the ambiguity of the plausible surfaces. Conventional methods use a regression-based formulation to predict a single "mode" for occluded and out-of-frustum surfaces, leading to blurriness, implausibility, and failure to capture multiple possible explanations. Thus, they often address this problem partially, focusing either on objects isolated from the background, reconstructing only visible surfaces, or failing to extrapolate far from the input views. In contrast, we propose a generative formulation to learn a distribution of 3D representations of Gaussian splats conditioned on a single input image. To address the lack of ground-truth training data, we propose a Variational AutoReconstructor to learn a latent space only from 2D images in a self-supervised manner, over which a diffusion model is trained. Our method generates faithful reconstructions and diverse samples with the ability to complete the occluded surfaces for high-quality 360-degree renderings.
Problem

Research questions and friction points this paper is trying to address.

Reconstruct complete 3D scenes with Gaussian splats from single images
Address ambiguity in plausible surfaces for occluded and unobserved areas
Generate diverse 3D representations overcoming regression-based limitations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent diffusion model for Gaussian splats
Self-supervised learning from 2D images
Generative completion of occluded surfaces
🔎 Similar Papers
No similar papers found.