Dust to Tower: Coarse-to-Fine Photo-Realistic Scene Reconstruction from Sparse Uncalibrated Images

📅 2024-12-27

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

To address the challenge of scene-level 3D reconstruction from sparse, uncalibrated images, this paper proposes a coarse-to-fine co-optimization framework that jointly refines 3D Gaussian Splatting (3DGS) scene representations and camera poses. Methodologically, it introduces a novel three-stage co-optimization mechanism: (i) robust geometric initialization via multi-view stereo (MVS), (ii) confidence-aware depth alignment, and (iii) warping-guided diffusion-based inpainting. By integrating monocular depth estimation with differentiable image warping, the method significantly improves geometric consistency and texture fidelity under sparse-view conditions. Evaluated on novel view synthesis and camera pose estimation, it achieves state-of-the-art performance in reconstruction quality (PSNR/SSIM), real-time rendering efficiency, and calibration robustness—without requiring prior scene knowledge or dense inputs. This work establishes a new paradigm for prior-free, real-world 3D scene modeling.

Technology Category

Application Category

📝 Abstract

Photo-realistic scene reconstruction from sparse-view, uncalibrated images is highly required in practice. Although some successes have been made, existing methods are either Sparse-View but require accurate camera parameters (i.e., intrinsic and extrinsic), or SfM-free but need densely captured images. To combine the advantages of both methods while addressing their respective weaknesses, we propose Dust to Tower (D2T), an accurate and efficient coarse-to-fine framework to optimize 3DGS and image poses simultaneously from sparse and uncalibrated images. Our key idea is to first construct a coarse model efficiently and subsequently refine it using warped and inpainted images at novel viewpoints. To do this, we first introduce a Coarse Construction Module (CCM) which exploits a fast Multi-View Stereo model to initialize a 3D Gaussian Splatting (3DGS) and recover initial camera poses. To refine the 3D model at novel viewpoints, we propose a Confidence Aware Depth Alignment (CADA) module to refine the coarse depth maps by aligning their confident parts with estimated depths by a Mono-depth model. Then, a Warped Image-Guided Inpainting (WIGI) module is proposed to warp the training images to novel viewpoints by the refined depth maps, and inpainting is applied to fulfill the ``holes"in the warped images caused by view-direction changes, providing high-quality supervision to further optimize the 3D model and the camera poses. Extensive experiments and ablation studies demonstrate the validity of D2T and its design choices, achieving state-of-the-art performance in both tasks of novel view synthesis and pose estimation while keeping high efficiency. Codes will be publicly available.

Problem

Research questions and friction points this paper is trying to address.

3D Reconstruction

Photorealistic Rendering

Sparse Imagery

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dust to Tower (D2T)

CADA tool

WIGI tool

🔎 Similar Papers

Generalizable 3D Scene Reconstruction via Divide and Conquer from a Single View