World Reconstruction From Inconsistent Views

📅 2026-03-17

📈 Citations: 0

✨ Influential: 0

career value

251K/year

🤖 AI Summary

This work addresses the challenge of 3D inconsistency in frames generated by video diffusion models, which hinders the reconstruction of coherent and explorable 3D scenes. To overcome this limitation, the authors propose a novel approach that aligns individual frames into a globally consistent coordinate system using non-rigid Iterative Closest Point (ICP) registration. This alignment is further refined through global point cloud optimization coupled with an inverse deformation-aware rendering loss. The method effectively transforms video diffusion models into generators of 3D-consistent worlds—a capability not previously achieved—yielding significant improvements over existing baselines in both geometric detail and spatial coherence. The resulting reconstructions produce interactive, structurally complete 3D environments suitable for immersive exploration.

Technology Category

Application Category

📝 Abstract

Video diffusion models generate high-quality and diverse worlds; however, individual frames often lack 3D consistency across the output sequence, which makes the reconstruction of 3D worlds difficult. To this end, we propose a new method that handles these inconsistencies by non-rigidly aligning the video frames into a globally-consistent coordinate frame that produces sharp and detailed pointcloud reconstructions. First, a geometric foundation model lifts each frame into a pixel-wise 3D pointcloud, which contains unaligned surfaces due to these inconsistencies. We then propose a tailored non-rigid iterative frame-to-model ICP to obtain an initial alignment across all frames, followed by a global optimization that further sharpens the pointcloud. Finally, we leverage this pointcloud as initialization for 3D reconstruction and propose a novel inverse deformation rendering loss to create high quality and explorable 3D environments from inconsistent views. We demonstrate that our 3D scenes achieve higher quality than baselines, effectively turning video models into 3D-consistent world generators.

Problem

Research questions and friction points this paper is trying to address.

3D consistency

world reconstruction

inconsistent views

video diffusion models

pointcloud alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

non-rigid alignment

3D consistency

pointcloud reconstruction