Dust to Tower: Coarse-to-Fine Photo-Realistic Scene Reconstruction from Sparse Uncalibrated Images

📅 2024-12-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of scene-level 3D reconstruction from sparse, uncalibrated images, this paper proposes a coarse-to-fine co-optimization framework that jointly refines 3D Gaussian Splatting (3DGS) scene representations and camera poses. Methodologically, it introduces a novel three-stage co-optimization mechanism: (i) robust geometric initialization via multi-view stereo (MVS), (ii) confidence-aware depth alignment, and (iii) warping-guided diffusion-based inpainting. By integrating monocular depth estimation with differentiable image warping, the method significantly improves geometric consistency and texture fidelity under sparse-view conditions. Evaluated on novel view synthesis and camera pose estimation, it achieves state-of-the-art performance in reconstruction quality (PSNR/SSIM), real-time rendering efficiency, and calibration robustness—without requiring prior scene knowledge or dense inputs. This work establishes a new paradigm for prior-free, real-world 3D scene modeling.

Technology Category

Application Category

📝 Abstract
Photo-realistic scene reconstruction from sparse-view, uncalibrated images is highly required in practice. Although some successes have been made, existing methods are either Sparse-View but require accurate camera parameters (i.e., intrinsic and extrinsic), or SfM-free but need densely captured images. To combine the advantages of both methods while addressing their respective weaknesses, we propose Dust to Tower (D2T), an accurate and efficient coarse-to-fine framework to optimize 3DGS and image poses simultaneously from sparse and uncalibrated images. Our key idea is to first construct a coarse model efficiently and subsequently refine it using warped and inpainted images at novel viewpoints. To do this, we first introduce a Coarse Construction Module (CCM) which exploits a fast Multi-View Stereo model to initialize a 3D Gaussian Splatting (3DGS) and recover initial camera poses. To refine the 3D model at novel viewpoints, we propose a Confidence Aware Depth Alignment (CADA) module to refine the coarse depth maps by aligning their confident parts with estimated depths by a Mono-depth model. Then, a Warped Image-Guided Inpainting (WIGI) module is proposed to warp the training images to novel viewpoints by the refined depth maps, and inpainting is applied to fulfill the ``holes"in the warped images caused by view-direction changes, providing high-quality supervision to further optimize the 3D model and the camera poses. Extensive experiments and ablation studies demonstrate the validity of D2T and its design choices, achieving state-of-the-art performance in both tasks of novel view synthesis and pose estimation while keeping high efficiency. Codes will be publicly available.
Problem

Research questions and friction points this paper is trying to address.

3D Reconstruction
Photorealistic Rendering
Sparse Imagery
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dust to Tower (D2T)
CADA tool
WIGI tool
🔎 Similar Papers
No similar papers found.
Xudong Cai
Xudong Cai
Renmin University of China
computer visioncamera localizationSLAM
Y
Yongcai Wang
School of Information, Renmin University of China, Beijing, China
Z
Zhaoxin Fan
Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, Institute of Artificial Intelligence, Beihang University, Beijing, China
Haoran Deng
Haoran Deng
UCLA
Machine LearningData Mining
S
Shuo Wang
School of Information, Renmin University of China, Beijing, China
W
Wanting Li
School of Information, Renmin University of China, Beijing, China
D
Deying Li
School of Information, Renmin University of China, Beijing, China
Lun Luo
Lun Luo
Zhejiang University
SLAMPlace Recognition
Minhang Wang
Minhang Wang
HAOMO.AI, Beijing, China
J
Jintao Xu
HAOMO.AI, Beijing, China