AlignCVC: Aligning Cross-View Consistency for Single-Image-to-3D Generation

📅 2025-06-29

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Single-image 3D reconstruction suffers from cross-view inconsistency (CVC)—a fundamental challenge wherein multi-view images synthesized from a single input exhibit geometric and appearance discrepancies across viewpoints, severely degrading reconstruction fidelity. To address this, we propose AlignCVC, a novel framework that enforces distribution-level alignment between generated and reconstructed views via a soft-hard dual-path mechanism, transcending the limitations of conventional regression-based losses. Leveraging pretrained generative models and differentiable rendering, AlignCVC enables end-to-end optimization and supports plug-and-play integration with diverse generation and reconstruction backbones. Experiments demonstrate that AlignCVC achieves state-of-the-art performance in reconstruction accuracy, visual consistency, and inference efficiency—surpassing existing feedback-optimization methods with only four optimization steps—while exhibiting strong generalization across architectures and scenes.

Technology Category

Application Category

📝 Abstract

Single-image-to-3D models typically follow a sequential generation and reconstruction workflow. However, intermediate multi-view images synthesized by pre-trained generation models often lack cross-view consistency (CVC), significantly degrading 3D reconstruction performance. While recent methods attempt to refine CVC by feeding reconstruction results back into the multi-view generator, these approaches struggle with noisy and unstable reconstruction outputs that limit effective CVC improvement. We introduce AlignCVC, a novel framework that fundamentally re-frames single-image-to-3D generation through distribution alignment rather than relying on strict regression losses. Our key insight is to align both generated and reconstructed multi-view distributions toward the ground-truth multi-view distribution, establishing a principled foundation for improved CVC. Observing that generated images exhibit weak CVC while reconstructed images display strong CVC due to explicit rendering, we propose a soft-hard alignment strategy with distinct objectives for generation and reconstruction models. This approach not only enhances generation quality but also dramatically accelerates inference to as few as 4 steps. As a plug-and-play paradigm, our method, namely AlignCVC, seamlessly integrates various multi-view generation models with 3D reconstruction models. Extensive experiments demonstrate the effectiveness and efficiency of AlignCVC for single-image-to-3D generation.

Problem

Research questions and friction points this paper is trying to address.

Addressing cross-view inconsistency in single-image-to-3D generation

Overcoming noisy reconstruction outputs limiting consistency improvement

Aligning multi-view distributions for enhanced 3D reconstruction performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Aligns multi-view distributions for consistency

Uses soft-hard alignment strategy

Accelerates inference to 4 steps

🔎 Similar Papers

Generalizable 3D Scene Reconstruction via Divide and Conquer from a Single View