NavCrafter: Exploring 3D Scenes from a Single Image

πŸ“… 2026-04-03
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of generating navigable 3D scenes from a single image by proposing a novel approach that integrates video diffusion models with geometry-aware expansion. The method employs a multi-stage camera control mechanism and a collision-aware trajectory planner to synthesize temporally and spatially consistent novel-view videos. Furthermore, it introduces an enhanced 3D Gaussian splatting pipeline that incorporates depth-aligned supervision and structural regularization to improve geometric fidelity. The proposed framework achieves state-of-the-art performance in novel view synthesis under large viewpoint changes, significantly enhancing both the accuracy of 3D reconstruction and the comprehensiveness of scene coverage.
πŸ“ Abstract
Creating flexible 3D scenes from a single image is vital when direct 3D data acquisition is costly or impractical. We introduce NavCrafter, a novel framework that explores 3D scenes from a single image by synthesizing novel-view video sequences with camera controllability and temporal-spatial consistency. NavCrafter leverages video diffusion models to capture rich 3D priors and adopts a geometry-aware expansion strategy to progressively extend scene coverage. To enable controllable multi-view synthesis, we introduce a multi-stage camera control mechanism that conditions diffusion models with diverse trajectories via dual-branch camera injection and attention modulation. We further propose a collision-aware camera trajectory planner and an enhanced 3D Gaussian Splatting (3DGS) pipeline with depth-aligned supervision, structural regularization and refinement. Extensive experiments demonstrate that NavCrafter achieves state-of-the-art novel-view synthesis under large viewpoint shifts and substantially improves 3D reconstruction fidelity.
Problem

Research questions and friction points this paper is trying to address.

3D scene reconstruction
single-image 3D
novel-view synthesis
camera controllability
temporal-spatial consistency
Innovation

Methods, ideas, or system contributions that make the work stand out.

video diffusion models
camera controllability
geometry-aware expansion
3D Gaussian Splatting
novel-view synthesis
πŸ”Ž Similar Papers
H
Hongbo Duan
Center for Artificial Intelligence and Robotics, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China; Peng Cheng Laboratory, 518108, China
P
Peiyu Zhuang
School of Cyber Science and Technology, Shenzhen Campus of Sun Yat-sen University, China
Yi Liu
Yi Liu
ζΈ…εŽε€§ε­¦
ζœΊε™¨δΊΊθ§†θ§‰ SLAM
Z
Zhengyang Zhang
Center for Artificial Intelligence and Robotics, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China
Y
Yuxin Zhang
Center for Artificial Intelligence and Robotics, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China
P
Pengting Luo
Central Media Technology Institute, Huawei Incorporated Company, China
Fangming Liu
Fangming Liu
Professor, School of Computer Science & Technology, Huazhong University of Science & Technology
AI & Cloud ComputingDatacenterLLM SystemEdge ComputingGreen Computing
Xueqian Wang
Xueqian Wang
Tsinghua University
Information FusionTarget DetectionRadar ImagingImage Processing