Gaussian Scenes: Pose-Free Sparse-View Scene Reconstruction using Depth-Enhanced Diffusion Priors

πŸ“… 2024-11-24
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing 360Β° scene reconstruction methods struggle with sparse, uncalibrated 2D images lacking camera poses. To address this, we propose the first end-to-end reconstruction framework requiring no pose priors. Our method introduces a depth-augmented diffusion prior to jointly guide novel view synthesis and depth estimation; employs a FiLM-based modulation mechanism to unify geometric and contextual feature representation; designs a Gaussian point cloud confidence metric to detect artifacts; and establishes a Gaussian-SLAM–style progressive multi-view fusion pipeline. Leveraging 3D Gaussian splatting and confidence-weighted fusion, our approach significantly outperforms prior pose-free methods on MipNeRF360 and DL3DV-10K, achieving reconstruction completeness and multi-view consistency on par with state-of-the-art pose-aware approaches.

Technology Category

Application Category

πŸ“ Abstract
In this work, we introduce a generative approach for pose-free (without camera parameters) reconstruction of 360 scenes from a sparse set of 2D images. Pose-free scene reconstruction from incomplete, pose-free observations is usually regularized with depth estimation or 3D foundational priors. While recent advances have enabled sparse-view reconstruction of large complex scenes (with high degree of foreground and background detail) with known camera poses using view-conditioned generative priors, these methods cannot be directly adapted for the pose-free setting when ground-truth poses are not available during evaluation. To address this, we propose an image-to-image generative model designed to inpaint missing details and remove artifacts in novel view renders and depth maps of a 3D scene. We introduce context and geometry conditioning using Feature-wise Linear Modulation (FiLM) modulation layers as a lightweight alternative to cross-attention and also propose a novel confidence measure for 3D Gaussian splat representations to allow for better detection of these artifacts. By progressively integrating these novel views in a Gaussian-SLAM-inspired process, we achieve a multi-view-consistent 3D representation. Evaluations on the MipNeRF360 and DL3DV-10K benchmark datasets demonstrate that our method surpasses existing pose-free techniques and performs competitively with state-of-the-art posed (precomputed camera parameters are given) reconstruction methods in complex 360 scenes.
Problem

Research questions and friction points this paper is trying to address.

Pose-free 360 scene reconstruction from sparse 2D images
Inpainting missing details in novel view renders and depth maps
Achieving multi-view-consistent 3D representation without ground-truth poses
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative model for pose-free scene reconstruction
Depth-enhanced diffusion priors with FiLM modulation
Confidence measure for 3D Gaussian splatting
πŸ”Ž Similar Papers
No similar papers found.
S
Soumava Paul
CCVL, Johns Hopkins University
Prakhar Kaushik
Prakhar Kaushik
Johns Hopkins University
Cognitive-Inspired AIMachine LearningComputer Vision3D VisionSignal Processing
A
Alan L. Yuille
CCVL, Johns Hopkins University