LucidFusion: Reconstructing 3D Gaussians with Arbitrary Unposed Images

📅 2024-10-21

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

To address the limitations of single-image 3D reconstruction—namely, its reliance on accurate camera pose estimation and poor generalization—this paper proposes a pose-free, arbitrary-view 3D Gaussian reconstruction method. The approach eliminates explicit pose estimation by (1) introducing Relative Coordinate Gaussians (RCGs), which encode scene geometry via internal relative spatial relationships rather than absolute camera poses; (2) designing a Relative Coordinate Mapping (RCM) module that enables implicit, calibration-free alignment of unstructured multi-view inputs; and (3) establishing an end-to-end differentiable image-to-3D-Gaussian translation framework, jointly optimizing implicit geometric learning and differentiable rasterization. Our method achieves second-level, high-fidelity 3D reconstruction without pose supervision, significantly improving cross-view consistency and geometric robustness. On standard benchmarks, it attains state-of-the-art accuracy and generalization performance.

Technology Category

Application Category

📝 Abstract

Recent large reconstruction models have made notable progress in generating high-quality 3D objects from single images. However, current reconstruction methods often rely on explicit camera pose estimation or fixed viewpoints, restricting their flexibility and practical applicability. We reformulate 3D reconstruction as image-to-image translation and introduce the Relative Coordinate Map (RCM), which aligns multiple unposed images to a main view without pose estimation. While RCM simplifies the process, its lack of global 3D supervision can yield noisy outputs. To address this, we propose Relative Coordinate Gaussians (RCG) as an extension to RCM, which treats each pixel's coordinates as a Gaussian center and employs differentiable rasterization for consistent geometry and pose recovery. Our LucidFusion framework handles an arbitrary number of unposed inputs, producing robust 3D reconstructions within seconds and paving the way for more flexible, pose-free 3D pipelines.

Problem

Research questions and friction points this paper is trying to address.

Reconstructing 3D objects from unposed images without camera pose estimation.

Addressing noisy outputs in 3D reconstruction due to lack of global supervision.

Enabling flexible, pose-free 3D reconstruction with arbitrary unposed image inputs.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Relative Coordinate Map aligns unposed images

Relative Coordinate Gaussians ensure consistent geometry

LucidFusion enables pose-free 3D reconstruction

🔎 Similar Papers

GaussianBlock: Building Part-Aware Compositional and Editable 3D Scene by Primitives and Gaussians