VidSplat: Gaussian Splatting Reconstruction with Geometry-Guided Video Diffusion Priors

📅 2026-05-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

198K/year
🤖 AI Summary
This work addresses the challenge of reconstructing complete and accurate 3D scenes from extremely sparse inputs—such as a single image—where Gaussian Splatting struggles, particularly in occluded or unobserved regions. The authors propose a training-free generative reconstruction framework that, for the first time, integrates video diffusion models into the Gaussian Splatting pipeline without requiring additional training. Their approach employs a geometry-guided, multi-stage denoising strategy coupled with a confidence-based iterative view completion mechanism to synthesize geometrically consistent novel views that fill in missing scene content. By further incorporating adaptive camera trajectory sampling and confidence-weighted optimization, the method achieves state-of-the-art performance across multiple benchmarks, enabling robust, high-fidelity, and complete 3D reconstructions from highly sparse observations.
📝 Abstract
Gaussian Splatting has achieved remarkable progress in multi-view surface reconstruction, yet it exhibits notable degradation when only few views are available. Although recent efforts alleviate this issue by enhancing multi-view consistency to produce plausible surfaces, they struggle to infer unseen, occluded, or weakly constrained regions beyond the input coverage. To address this limitation, we present VidSplat, a training-free generative reconstruction framework that leverages powerful video diffusion priors to iteratively synthesize novel views that compensate for missing input coverage, and thereby recover complete 3D scenes from sparse inputs. Specifically, we tackle two key challenges that enable the effective integration of generation and reconstruction. First, for 3D consistent generation, we elaborate a training-free, stage-wise denoising strategy that adaptively guides the denoising direction toward the underlying geometry using the rendered RGB and mask images. Second, to enhance the reconstruction, we develop an iterative mechanism that samples camera trajectories, explores unobserved regions, synthesizes novel views, and supplements training through confidence weighted refinement. VidSplat performs robustly to sparse input and even a single image. Extensive experiments on widely used benchmarks demonstrate our superior performance in sparse-view scene reconstruction.
Problem

Research questions and friction points this paper is trying to address.

Gaussian Splatting
sparse-view reconstruction
3D scene completion
occluded regions
multi-view consistency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gaussian Splatting
video diffusion priors
sparse-view reconstruction
geometry-guided generation
training-free framework
J
Jimin Tang
School of Software, Tsinghua University, China
Wenyuan Zhang
Wenyuan Zhang
Tsinghua University
3D Computer Vision3D ReconstructionVideo Generation
Junsheng Zhou
Junsheng Zhou
Tsinghua University
3D computer vision
Z
Zian Huang
School of Software, Tsinghua University, China
K
Kanle Shi
Kuaishou Technology, China
S
Shenkun Xu
Kuaishou Technology, China
Y
Yu-Shen Liu
School of Software, Tsinghua University, China
Zhizhong Han
Zhizhong Han
Assistant Professor of Computer Science at Wayne State University
3D Computer VisionDigital Geometry ProcessingArtificial IntelligenceARVR