RealisticDreamer: Guidance Score Distillation for Few-shot Gaussian Splatting

📅 2025-11-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address overfitting in 3D Gaussian Splatting (3DGS) under sparse input views—caused by insufficient intermediate-view supervision—this paper proposes the first score-distillation guidance framework leveraging pre-trained video diffusion models. Methodologically, it introduces multi-view consistency priors encoded in video diffusion models into 3DGS optimization for the first time, and designs a unified guidance mechanism jointly utilizing depth warping and semantic features to rectify noise prediction directions, thereby mitigating score-distillation bias induced by motion and camera trajectory ambiguities. Additionally, multi-view rendering supervision is incorporated to enhance geometric accuracy and pose alignment. Experiments demonstrate that our approach significantly outperforms existing methods across multiple sparse-view datasets, achieving more robust 3D reconstruction and high-fidelity real-time rendering.

Technology Category

Application Category

📝 Abstract
3D Gaussian Splatting (3DGS) has recently gained great attention in the 3D scene representation for its high-quality real-time rendering capabilities. However, when the input comprises sparse training views, 3DGS is prone to overfitting, primarily due to the lack of intermediate-view supervision. Inspired by the recent success of Video Diffusion Models (VDM), we propose a framework called Guidance Score Distillation (GSD) to extract the rich multi-view consistency priors from pretrained VDMs. Building on the insights from Score Distillation Sampling (SDS), GSD supervises rendered images from multiple neighboring views, guiding the Gaussian splatting representation towards the generative direction of VDM. However, the generative direction often involves object motion and random camera trajectories, making it challenging for direct supervision in the optimization process. To address this problem, we introduce an unified guidance form to correct the noise prediction result of VDM. Specifically, we incorporate both a depth warp guidance based on real depth maps and a guidance based on semantic image features, ensuring that the score update direction from VDM aligns with the correct camera pose and accurate geometry. Experimental results show that our method outperforms existing approaches across multiple datasets.
Problem

Research questions and friction points this paper is trying to address.

Addresses overfitting in sparse-view 3D Gaussian Splatting optimization
Corrects generative noise predictions using depth and semantic guidance
Extracts multi-view consistency priors from pretrained video diffusion models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Guidance Score Distillation from Video Diffusion Models
Depth warp guidance using real depth maps
Semantic image features for geometry alignment
🔎 Similar Papers
No similar papers found.
R
Ruocheng Wu
University of Electronic Science and Technology of China
H
Haolan He
University of Electronic Science and Technology of China
Y
Yufei Wang
Nanyang Technological University
Z
Zhihao Li
Nanyang Technological University
Bihan Wen
Bihan Wen
Associate Professor, Nanyang Technological University
Machine LearningImage ProcessingComputational ImagingComputer VisionTrustworthy AI