MS-GS: Multi-Appearance Sparse-View 3D Gaussian Splatting in the Wild

📅 2025-09-18

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Existing NeRF and 3D Gaussian Splatting (3DGS) methods struggle with 3D reconstruction and novel view synthesis from sparse, in-the-wild photo collections exhibiting multi-appearance variations (e.g., across seasons or times of day), suffering from overfitting and geometric inconsistency. Method: We propose a local semantic region alignment strategy that fuses monocular depth priors with SfM-derived anchor points, coupled with a multi-scale geometric guidance supervision mechanism under virtual views—integrated within the 3DGS framework to jointly optimize fine- and coarse-grained geometry. Contribution/Results: Our approach significantly improves 3D consistency, mitigates overfitting, and generalizes effectively to complex real-world scenes without additional training. Extensive evaluation on both a newly constructed dataset and public benchmarks demonstrates substantial improvements over state-of-the-art methods in rendering quality and geometric fidelity.

Technology Category

Application Category

📝 Abstract

In-the-wild photo collections often contain limited volumes of imagery and exhibit multiple appearances, e.g., taken at different times of day or seasons, posing significant challenges to scene reconstruction and novel view synthesis. Although recent adaptations of Neural Radiance Field (NeRF) and 3D Gaussian Splatting (3DGS) have improved in these areas, they tend to oversmooth and are prone to overfitting. In this paper, we present MS-GS, a novel framework designed with Multi-appearance capabilities in Sparse-view scenarios using 3DGS. To address the lack of support due to sparse initializations, our approach is built on the geometric priors elicited from monocular depth estimations. The key lies in extracting and utilizing local semantic regions with a Structure-from-Motion (SfM) points anchored algorithm for reliable alignment and geometry cues. Then, to introduce multi-view constraints, we propose a series of geometry-guided supervision at virtual views in a fine-grained and coarse scheme to encourage 3D consistency and reduce overfitting. We also introduce a dataset and an in-the-wild experiment setting to set up more realistic benchmarks. We demonstrate that MS-GS achieves photorealistic renderings under various challenging sparse-view and multi-appearance conditions and outperforms existing approaches significantly across different datasets.

Problem

Research questions and friction points this paper is trying to address.

Reconstructing 3D scenes from sparse multi-appearance photo collections

Addressing oversmoothing and overfitting in sparse-view 3D reconstruction

Enhancing novel view synthesis under varying appearance conditions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes monocular depth priors for sparse initialization

Employs SfM-anchored semantic regions for alignment

Implements geometry-guided multi-view supervision scheme

🔎 Similar Papers

No similar papers found.