Intern-GS: Vision Model Guided Sparse-View 3D Gaussian Splatting

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

Sparse-view 3D scene reconstruction suffers from severe geometric incompleteness and texture distortion due to highly insufficient observations. To address this, we propose the first framework that deeply integrates Vision Foundation Models (VFMs) into the entire 3D Gaussian Splatting (3DGS) pipeline: DUSt3R generates a compact, redundancy-free initial point cloud; DINOv2 and Depth Anything jointly extract cross-view semantic and depth priors to guide Gaussian parameter initialization and joint optimization; and differentiable rendering enables geometry-appearance co-compensation of unobserved regions. Evaluated on LLFF, DTU, and Tanks and Temples benchmarks under ≤5 input views, our method achieves significant improvements over state-of-the-art rendering quality. It notably enhances geometric completeness and texture fidelity, outperforming existing approaches by large margins.

Technology Category

Application Category

📝 Abstract

Sparse-view scene reconstruction often faces significant challenges due to the constraints imposed by limited observational data. These limitations result in incomplete information, leading to suboptimal reconstructions using existing methodologies. To address this, we present Intern-GS, a novel approach that effectively leverages rich prior knowledge from vision foundation models to enhance the process of sparse-view Gaussian Splatting, thereby enabling high-quality scene reconstruction. Specifically, Intern-GS utilizes vision foundation models to guide both the initialization and the optimization process of 3D Gaussian splatting, effectively addressing the limitations of sparse inputs. In the initialization process, our method employs DUSt3R to generate a dense and non-redundant gaussian point cloud. This approach significantly alleviates the limitations encountered by traditional structure-from-motion (SfM) methods, which often struggle under sparse-view constraints. During the optimization process, vision foundation models predict depth and appearance for unobserved views, refining the 3D Gaussians to compensate for missing information in unseen regions. Extensive experiments demonstrate that Intern-GS achieves state-of-the-art rendering quality across diverse datasets, including both forward-facing and large-scale scenes, such as LLFF, DTU, and Tanks and Temples.

Problem

Research questions and friction points this paper is trying to address.

Enhances sparse-view 3D reconstruction using vision models

Improves initialization with dense gaussian point clouds

Optimizes rendering with predicted depth and appearance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages vision foundation models for guidance

Uses DUSt3R for dense Gaussian point cloud

Predicts depth and appearance for unobserved views

🔎 Similar Papers

No similar papers found.