MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion Model

📅 2024-11-25

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses single-image-driven novel view synthesis (NVS), proposing a diffusion model enhanced with 3D priors guided by differentiable metric depth estimation and camera pose prediction. The method introduces a 3D spatial warping module coupled with a metric depth alignment mechanism to enforce geometric consistency across synthesized views. Leveraging the large-scale multi-view dataset MvD-1M (1.6 million scenes) and a customized training strategy, it enables single-forward inference to generate up to 100 high-fidelity, geometrically coherent novel views from one input image. Compared to prior approaches, our method significantly improves cross-scene generalization and 3D structural fidelity, achieving state-of-the-art performance on both in-domain and out-of-domain NVS benchmarks. The code and pretrained models are publicly available.

Technology Category

Application Category

📝 Abstract

We introduce MVGenMaster, a multi-view diffusion model enhanced with 3D priors to address versatile Novel View Synthesis (NVS) tasks. MVGenMaster leverages 3D priors that are warped using metric depth and camera poses, significantly enhancing both generalization and 3D consistency in NVS. Our model features a simple yet effective pipeline that can generate up to 100 novel views conditioned on variable reference views and camera poses with a single forward process. Additionally, we have developed a comprehensive large-scale multi-view image dataset called MvD-1M, comprising up to 1.6 million scenes, equipped with well-aligned metric depth to train MVGenMaster. Moreover, we present several training and model modifications to strengthen the model with scaled-up datasets. Extensive evaluations across in- and out-of-domain benchmarks demonstrate the effectiveness of our proposed method and data formulation. Models and codes will be released at https://github.com/ewrfcas/MVGenMaster/.

Problem

Research questions and friction points this paper is trying to address.

Enhances multi-view generation using 3D priors for Novel View Synthesis.

Generates up to 100 novel views from variable reference views.

Utilizes a large-scale dataset for training with aligned metric depth.

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D priors enhance multi-view diffusion model

Generates 100 views via single forward process

Large-scale dataset MvD-1M with metric depth

🔎 Similar Papers

No similar papers found.

Authors to Follow