🤖 AI Summary
This work addresses the challenges of large-scale missing regions and global inconsistency in 3D indoor scene reconstruction from sparse inputs. The authors propose a “reconstruction–optimization” paradigm: an initial panoramic video is first generated via radial exploration, then refined through a temporally consistent panoramic video-to-video diffusion model for inpainting and super-resolution. The enhanced video serves as a pseudo-ground-truth to guide the global optimization of a 3D Gaussian Splatting field. This study introduces the first coupling of panoramic video diffusion priors with 3D Gaussian representations, presenting a novel video-guided 3D optimization mechanism and releasing PanoV2V-15K, the first large-scale dataset for panoramic video inpainting. The method significantly outperforms existing baselines in long-range exploration tasks, producing photorealistic and globally consistent 360-degree indoor scenes.
📝 Abstract
The growing demand for Embodied AI and VR applications has highlighted the need for synthesizing high-quality 3D indoor scenes from sparse inputs. However, existing approaches struggle to infer massive amounts of missing geometry in large unseen areas while maintaining global consistency, often producing locally plausible but globally inconsistent reconstructions. We present Rein3D, a framework that reconstructs full 360-degree indoor environments by coupling explicit 3D Gaussian Splatting (3DGS) with temporally coherent priors from video diffusion models. Our approach follows a "restore-and-refine" paradigm: we employ a radial exploration strategy to render imperfect panoramic videos along trajectories starting from the origin, effectively uncovering occluded regions from a coarse 3DGS initialization. These sequences are restored by a panoramic video-to-video diffusion model and further enhanced via video super-resolution to synthesize high-fidelity geometry and textures. Finally, these refined videos serve as pseudo-ground truths to update the global 3D Gaussian field. To support this task, we construct PanoV2V-15K, a dataset of over 15K paired clean and degraded panoramic videos for diffusion-based scene restoration. Experiments demonstrate that Rein3D produces photorealistic and globally consistent 3D scenes and significantly improves long-range camera exploration compared with existing baselines.