Matrix-3D: Omnidirectional Explorable 3D World Generation

πŸ“… 2025-08-11
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing single-image or text-driven 3D world generation methods suffer from limited scene coverage and lack of free navigability. To address this, we propose the first unified framework for omnidirectional, explorable 3D world generation, integrating panoramic video diffusion with geometry-aware 3D reconstruction. Our method introduces a trajectory-guided panoramic video diffusion model, a feed-forward wide-field-of-view reconstruction network, and an optimization-driven end-to-end 3D reconstruction pipeline. To enable training and evaluation, we introduce Matrix-Panoβ€”the first large-scale synthetic dataset featuring dense depth maps and multi-view camera trajectories. Experiments demonstrate state-of-the-art performance on both panoramic video generation and 3D world reconstruction, significantly improving spatial scale, geometric consistency, and interactivity of generated scenes, thereby enabling immersive, large-scale spatial exploration.

Technology Category

Application Category

πŸ“ Abstract
Explorable 3D world generation from a single image or text prompt forms a cornerstone of spatial intelligence. Recent works utilize video model to achieve wide-scope and generalizable 3D world generation. However, existing approaches often suffer from a limited scope in the generated scenes. In this work, we propose Matrix-3D, a framework that utilize panoramic representation for wide-coverage omnidirectional explorable 3D world generation that combines conditional video generation and panoramic 3D reconstruction. We first train a trajectory-guided panoramic video diffusion model that employs scene mesh renders as condition, to enable high-quality and geometrically consistent scene video generation. To lift the panorama scene video to 3D world, we propose two separate methods: (1) a feed-forward large panorama reconstruction model for rapid 3D scene reconstruction and (2) an optimization-based pipeline for accurate and detailed 3D scene reconstruction. To facilitate effective training, we also introduce the Matrix-Pano dataset, the first large-scale synthetic collection comprising 116K high-quality static panoramic video sequences with depth and trajectory annotations. Extensive experiments demonstrate that our proposed framework achieves state-of-the-art performance in panoramic video generation and 3D world generation. See more in https://matrix-3d.github.io.
Problem

Research questions and friction points this paper is trying to address.

Generating explorable 3D worlds from single images or text prompts
Overcoming limited scope in existing 3D scene generation methods
Achieving high-quality panoramic 3D reconstruction with geometric consistency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Panoramic video diffusion model for 3D scenes
Feed-forward large panorama reconstruction model
Optimization-based detailed 3D scene reconstruction
πŸ”Ž Similar Papers
No similar papers found.