CDI3D: Cross-guided Dense-view Interpolation for 3D Reconstruction

📅 2025-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Single-view 3D reconstruction suffers from multi-view inconsistency in diffusion-generated views and exacerbated geometric incoherence when leveraging large reconstruction models (LRMs). To address this, we propose CDI3D—a novel end-to-end feedforward framework featuring a Dense View Interpolation (DVI) module that models oblique camera trajectories to achieve geometrically consistent dense interpolation between diffusion-synthesized views. For the first time, interpolated views are jointly fed with original input views into a triplane-based reconstruction network, enabling joint encoding of triplane features and implicit grid decoding. Our method achieves state-of-the-art performance across multiple benchmarks: it significantly improves 3D mesh geometric accuracy and texture fidelity, while offering superior inference efficiency compared to iterative optimization approaches—effectively balancing high fidelity and computational efficiency.

Technology Category

Application Category

📝 Abstract
3D object reconstruction from single-view image is a fundamental task in computer vision with wide-ranging applications. Recent advancements in Large Reconstruction Models (LRMs) have shown great promise in leveraging multi-view images generated by 2D diffusion models to extract 3D content. However, challenges remain as 2D diffusion models often struggle to produce dense images with strong multi-view consistency, and LRMs tend to amplify these inconsistencies during the 3D reconstruction process. Addressing these issues is critical for achieving high-quality and efficient 3D reconstruction. In this paper, we present CDI3D, a feed-forward framework designed for efficient, high-quality image-to-3D generation with view interpolation. To tackle the aforementioned challenges, we propose to integrate 2D diffusion-based view interpolation into the LRM pipeline to enhance the quality and consistency of the generated mesh. Specifically, our approach introduces a Dense View Interpolation (DVI) module, which synthesizes interpolated images between main views generated by the 2D diffusion model, effectively densifying the input views with better multi-view consistency. We also design a tilt camera pose trajectory to capture views with different elevations and perspectives. Subsequently, we employ a tri-plane-based mesh reconstruction strategy to extract robust tokens from these interpolated and original views, enabling the generation of high-quality 3D meshes with superior texture and geometry. Extensive experiments demonstrate that our method significantly outperforms previous state-of-the-art approaches across various benchmarks, producing 3D content with enhanced texture fidelity and geometric accuracy.
Problem

Research questions and friction points this paper is trying to address.

Improves multi-view consistency in 3D reconstruction
Enhances texture and geometry quality in 3D meshes
Integrates 2D diffusion-based view interpolation with LRMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates 2D diffusion-based view interpolation
Introduces Dense View Interpolation module
Uses tri-plane-based mesh reconstruction strategy
🔎 Similar Papers
No similar papers found.