Multimodal-Prior-Guided Importance Sampling for Hierarchical Gaussian Splatting in Sparse-View Novel View Synthesis

📅 2026-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of recovering fine geometric details in novel view synthesis from sparse inputs, where existing methods often suffer from texture overfitting and inconsistencies in pose or appearance. To mitigate these issues, we propose a multimodal prior-guided importance sampling mechanism that integrates photometric residuals, semantic cues, and geometric priors to assess local recoverability. Guided by this assessment, our method selectively injects refined Gaussian points into a hierarchical 3D Gaussian splatting representation. By leveraging multimodal priors to inform the placement of new Gaussians—rather than relying solely on residual errors—we effectively avoid overfitting. Furthermore, we introduce a geometry-aware strategy for adding and retaining Gaussians, which preserves newly introduced points in under-constrained regions. Our approach achieves state-of-the-art performance across multiple sparse-view benchmarks, with up to a 0.3 dB PSNR improvement on the DTU dataset.

Technology Category

Application Category

📝 Abstract
We present multimodal-prior-guided importance sampling as the central mechanism for hierarchical 3D Gaussian Splatting (3DGS) in sparse-view novel view synthesis. Our sampler fuses complementary cues { -- } photometric rendering residuals, semantic priors, and geometric priors { -- } to produce a robust, local recoverability estimate that directly drives where to inject fine Gaussians. Built around this sampling core, our framework comprises (1) a coarse-to-fine Gaussian representation that encodes global shape with a stable coarse layer and selectively adds fine primitives where the multimodal metric indicates recoverable detail; and (2) a geometric-aware sampling and retention policy that concentrates refinement on geometrically critical and complex regions while protecting newly added primitives in underconstrained areas from premature pruning. By prioritizing regions supported by consistent multimodal evidence rather than raw residuals alone, our method alleviates overfitting texture-induced errors and suppresses noise from pose/appearance inconsistencies. Experiments on diverse sparse-view benchmarks demonstrate state-of-the-art reconstructions, with up to +0.3 dB PSNR on DTU.
Problem

Research questions and friction points this paper is trying to address.

sparse-view novel view synthesis
3D Gaussian Splatting
importance sampling
multimodal priors
overfitting
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal-prior-guided sampling
hierarchical Gaussian Splatting
sparse-view novel view synthesis
geometric-aware refinement
importance sampling
🔎 Similar Papers
No similar papers found.
K
Kaiqiang Xiong
Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Shenzhen Graduate School, Peking University, China; Peng Cheng Laboratory, China
Z
Zhanke Wang
Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Shenzhen Graduate School, Peking University, China
Ronggang Wang
Ronggang Wang
Shenzhen Graduate School, Peking University
Immersive Video Coding and Processing