FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction

📅 2024-12-12
🏛️ arXiv.org
📈 Citations: 9
Influential: 1
📄 PDF
🤖 AI Summary
Sparse-view 3D reconstruction is fundamentally constrained by unknown camera poses. To address this, we propose the first end-to-end Gaussian Splatting framework that requires no pose priors, jointly optimizing 3D Gaussian distributions and full camera intrinsics/extrinsics directly from uncalibrated images—enabling co-learning of geometric representation and pose estimation within a unified reference frame. Our method employs a lightweight sequence-wise self-attention Transformer for pixel-level 3D Gaussian primitive decoding and integrates an on-the-fly, plug-and-play pose solver compatible with both object- and scene-level modeling. Evaluated across multiple benchmarks, our approach achieves state-of-the-art reconstruction quality and camera pose accuracy, while inference takes only a few seconds—significantly accelerating downstream text/image-to-3D generation tasks.

Technology Category

Application Category

📝 Abstract
Existing sparse-view reconstruction models heavily rely on accurate known camera poses. However, deriving camera extrinsics and intrinsics from sparse-view images presents significant challenges. In this work, we present FreeSplatter, a highly scalable, feed-forward reconstruction framework capable of generating high-quality 3D Gaussians from uncalibrated sparse-view images and recovering their camera parameters in mere seconds. FreeSplatter is built upon a streamlined transformer architecture, comprising sequential self-attention blocks that facilitate information exchange among multi-view image tokens and decode them into pixel-wise 3D Gaussian primitives. The predicted Gaussian primitives are situated in a unified reference frame, allowing for high-fidelity 3D modeling and instant camera parameter estimation using off-the-shelf solvers. To cater to both object-centric and scene-level reconstruction, we train two model variants of FreeSplatter on extensive datasets. In both scenarios, FreeSplatter outperforms state-of-the-art baselines in terms of reconstruction quality and pose estimation accuracy. Furthermore, we showcase FreeSplatter's potential in enhancing the productivity of downstream applications, such as text/image-to-3D content creation.
Problem

Research questions and friction points this paper is trying to address.

Reconstructing 3D from sparse views without camera poses
Estimating camera parameters from uncalibrated sparse images
Generating high-quality 3D Gaussians in a unified framework
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pose-free Gaussian splatting for sparse-view reconstruction
Transformer architecture with self-attention for multi-view information exchange
Generates pixel-aligned 3D Gaussians in unified reference frame
🔎 Similar Papers
No similar papers found.