NOVA3D: Normal Aligned Video Diffusion Model for Single Image to 3D Generation

📅 2025-06-09

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Existing single-image 3D generation methods suffer from inconsistent multi-view geometry and texture due to insufficient 3D priors. This paper introduces the first single-image 3D reconstruction framework leveraging pre-trained video diffusion models, exploiting their implicit strong spatio-temporal and geometric priors to enhance 3D consistency. Our key contributions are: (1) a Geometry-Temporal Alignment (GTA) attention mechanism that explicitly enforces cross-view geometric constraints and inter-frame motion coherence; and (2) a conflict-free geometric fusion algorithm that jointly optimizes implicit surface and normal fields. Integrated with Score Distillation Sampling and multi-view geometric regularization, our method achieves significant improvements in multi-view consistency and texture fidelity on benchmarks including ShapeNet and Objaverse, comprehensively surpassing current state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract

3D AI-generated content (AIGC) has made it increasingly accessible for anyone to become a 3D content creator. While recent methods leverage Score Distillation Sampling to distill 3D objects from pretrained image diffusion models, they often suffer from inadequate 3D priors, leading to insufficient multi-view consistency. In this work, we introduce NOVA3D, an innovative single-image-to-3D generation framework. Our key insight lies in leveraging strong 3D priors from a pretrained video diffusion model and integrating geometric information during multi-view video fine-tuning. To facilitate information exchange between color and geometric domains, we propose the Geometry-Temporal Alignment (GTA) attention mechanism, thereby improving generalization and multi-view consistency. Moreover, we introduce the de-conflict geometry fusion algorithm, which improves texture fidelity by addressing multi-view inaccuracies and resolving discrepancies in pose alignment. Extensive experiments validate the superiority of NOVA3D over existing baselines.

Problem

Research questions and friction points this paper is trying to address.

Inadequate 3D priors cause insufficient multi-view consistency

Leveraging 3D priors from video diffusion models for better generation

Addressing multi-view inaccuracies and pose alignment discrepancies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages pretrained video diffusion model

Uses Geometry-Temporal Alignment attention

Applies de-conflict geometry fusion algorithm

🔎 Similar Papers

No similar papers found.