🤖 AI Summary
In generative 3D reconstruction, narrow-baseline local observations—such as single-view or small-FOV images—pose two key challenges: severe viewpoint limitation and inconsistent generation of occluded regions. To address these, this paper introduces the first fine-tuning-free zero-shot fusion framework. Methodologically, it innovatively enforces joint alignment of multi-source priors (textual, CLIP-based, and depth) within the DDIM sampling process, and incorporates an iterative, geometry-guided implicit field optimization to synergistically integrate local dense observations with semantic and geometric priors. Experiments demonstrate that our approach significantly outperforms state-of-the-art methods across multiple benchmarks. Notably, it achieves breakthrough improvements in visual consistency and geometric completeness of unobserved regions, enabling, for the first time, high-fidelity, globally consistent 3D zero-shot reconstruction from narrow-baseline inputs alone.
📝 Abstract
Generative 3D reconstruction shows strong potential in incomplete observations. While sparse-view and single-image reconstruction are well-researched, partial observation remains underexplored. In this context, dense views are accessible only from a specific angular range, with other perspectives remaining inaccessible. This task presents two main challenges: (i) limited View Range: observations confined to a narrow angular scope prevent effective traditional interpolation techniques that require evenly distributed perspectives. (ii) inconsistent Generation: views created for invisible regions often lack coherence with both visible regions and each other, compromising reconstruction consistency. To address these challenges, we propose method, a novel training-free approach that integrates the local dense observations and multi-source priors for reconstruction. Our method introduces a fusion-based strategy to effectively align these priors in DDIM sampling, thereby generating multi-view consistent images to supervise invisible views. We further design an iterative refinement strategy, which uses the geometric structures of the object to enhance reconstruction quality. Extensive experiments on multiple datasets show the superiority of our method over SOTAs, especially in invisible regions.