GeoWorld: Unlocking the Potential of Geometry Models to Facilitate High-Fidelity 3D Scene Generation

📅 2025-11-28

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Existing single-image-to-3D scene generation methods suffer from geometric distortions and texture blurriness, primarily due to inherent limitations in monocular depth estimation. To address this, we propose Video2Scene—a novel framework that leverages a video diffusion model to synthesize multi-view frames, extracts globally consistent geometric features from them, and reconstructs 3D scenes using a predefined camera trajectory. Our key contributions are: (1) a geometric alignment loss enforcing structural consistency across multi-frame depth maps under camera motion constraints; and (2) a lightweight geometric adaptation module enhancing cross-frame geometric feature transferability and utilization. Evaluated on ScanNet and Matterport3D, Video2Scene significantly outperforms state-of-the-art methods, achieving substantial improvements in PSNR, LPIPS, and Chamfer Distance—quantifying both visual fidelity and geometric accuracy. Qualitative results further confirm that reconstructed scenes exhibit high geometric precision and photorealistic texture quality.

Technology Category

Application Category

📝 Abstract

Previous works leveraging video models for image-to-3D scene generation tend to suffer from geometric distortions and blurry content. In this paper, we renovate the pipeline of image-to-3D scene generation by unlocking the potential of geometry models and present our GeoWorld. Instead of exploiting geometric information obtained from a single-frame input, we propose to first generate consecutive video frames and then take advantage of the geometry model to provide full-frame geometry features, which contain richer information than single-frame depth maps or camera embeddings used in previous methods, and use these geometry features as geometrical conditions to aid the video generation model. To enhance the consistency of geometric structures, we further propose a geometry alignment loss to provide the model with real-world geometric constraints and a geometry adaptation module to ensure the effective utilization of geometry features. Extensive experiments show that our GeoWorld can generate high-fidelity 3D scenes from a single image and a given camera trajectory, outperforming prior methods both qualitatively and quantitatively. Project Page: https://peaes.github.io/GeoWorld/.

Problem

Research questions and friction points this paper is trying to address.

Addresses geometric distortions in image-to-3D scene generation

Enhances geometric consistency using multi-frame geometry features

Improves fidelity of 3D scenes from single images

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates consecutive video frames for geometry extraction

Uses geometry features as conditions for video generation

Introduces geometry alignment loss and adaptation module

🔎 Similar Papers

No similar papers found.