CDI3D: Cross-guided Dense-view Interpolation for 3D Reconstruction

📅 2025-03-11

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Single-view 3D reconstruction suffers from multi-view inconsistency in diffusion-generated views and exacerbated geometric incoherence when leveraging large reconstruction models (LRMs). To address this, we propose CDI3D—a novel end-to-end feedforward framework featuring a Dense View Interpolation (DVI) module that models oblique camera trajectories to achieve geometrically consistent dense interpolation between diffusion-synthesized views. For the first time, interpolated views are jointly fed with original input views into a triplane-based reconstruction network, enabling joint encoding of triplane features and implicit grid decoding. Our method achieves state-of-the-art performance across multiple benchmarks: it significantly improves 3D mesh geometric accuracy and texture fidelity, while offering superior inference efficiency compared to iterative optimization approaches—effectively balancing high fidelity and computational efficiency.

Technology Category

Application Category

📝 Abstract

3D object reconstruction from single-view image is a fundamental task in computer vision with wide-ranging applications. Recent advancements in Large Reconstruction Models (LRMs) have shown great promise in leveraging multi-view images generated by 2D diffusion models to extract 3D content. However, challenges remain as 2D diffusion models often struggle to produce dense images with strong multi-view consistency, and LRMs tend to amplify these inconsistencies during the 3D reconstruction process. Addressing these issues is critical for achieving high-quality and efficient 3D reconstruction. In this paper, we present CDI3D, a feed-forward framework designed for efficient, high-quality image-to-3D generation with view interpolation. To tackle the aforementioned challenges, we propose to integrate 2D diffusion-based view interpolation into the LRM pipeline to enhance the quality and consistency of the generated mesh. Specifically, our approach introduces a Dense View Interpolation (DVI) module, which synthesizes interpolated images between main views generated by the 2D diffusion model, effectively densifying the input views with better multi-view consistency. We also design a tilt camera pose trajectory to capture views with different elevations and perspectives. Subsequently, we employ a tri-plane-based mesh reconstruction strategy to extract robust tokens from these interpolated and original views, enabling the generation of high-quality 3D meshes with superior texture and geometry. Extensive experiments demonstrate that our method significantly outperforms previous state-of-the-art approaches across various benchmarks, producing 3D content with enhanced texture fidelity and geometric accuracy.

Problem

Research questions and friction points this paper is trying to address.

Improves multi-view consistency in 3D reconstruction

Enhances texture and geometry quality in 3D meshes

Integrates 2D diffusion-based view interpolation with LRMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates 2D diffusion-based view interpolation

Introduces Dense View Interpolation module

Uses tri-plane-based mesh reconstruction strategy

🔎 Similar Papers

Generalizable 3D Scene Reconstruction via Divide and Conquer from a Single View