NavCrafter: Exploring 3D Scenes from a Single Image

📅 2026-04-03

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

This work addresses the challenge of generating navigable 3D scenes from a single image by proposing a novel approach that integrates video diffusion models with geometry-aware expansion. The method employs a multi-stage camera control mechanism and a collision-aware trajectory planner to synthesize temporally and spatially consistent novel-view videos. Furthermore, it introduces an enhanced 3D Gaussian splatting pipeline that incorporates depth-aligned supervision and structural regularization to improve geometric fidelity. The proposed framework achieves state-of-the-art performance in novel view synthesis under large viewpoint changes, significantly enhancing both the accuracy of 3D reconstruction and the comprehensiveness of scene coverage.

Technology Category

Application Category

📝 Abstract

Creating flexible 3D scenes from a single image is vital when direct 3D data acquisition is costly or impractical. We introduce NavCrafter, a novel framework that explores 3D scenes from a single image by synthesizing novel-view video sequences with camera controllability and temporal-spatial consistency. NavCrafter leverages video diffusion models to capture rich 3D priors and adopts a geometry-aware expansion strategy to progressively extend scene coverage. To enable controllable multi-view synthesis, we introduce a multi-stage camera control mechanism that conditions diffusion models with diverse trajectories via dual-branch camera injection and attention modulation. We further propose a collision-aware camera trajectory planner and an enhanced 3D Gaussian Splatting (3DGS) pipeline with depth-aligned supervision, structural regularization and refinement. Extensive experiments demonstrate that NavCrafter achieves state-of-the-art novel-view synthesis under large viewpoint shifts and substantially improves 3D reconstruction fidelity.

Problem

Research questions and friction points this paper is trying to address.

3D scene reconstruction

single-image 3D

novel-view synthesis

camera controllability

temporal-spatial consistency

Innovation

Methods, ideas, or system contributions that make the work stand out.

video diffusion models

camera controllability

geometry-aware expansion