🤖 AI Summary
Sparse-view 3D reconstruction is fundamentally constrained by unknown camera poses. To address this, we propose the first end-to-end Gaussian Splatting framework that requires no pose priors, jointly optimizing 3D Gaussian distributions and full camera intrinsics/extrinsics directly from uncalibrated images—enabling co-learning of geometric representation and pose estimation within a unified reference frame. Our method employs a lightweight sequence-wise self-attention Transformer for pixel-level 3D Gaussian primitive decoding and integrates an on-the-fly, plug-and-play pose solver compatible with both object- and scene-level modeling. Evaluated across multiple benchmarks, our approach achieves state-of-the-art reconstruction quality and camera pose accuracy, while inference takes only a few seconds—significantly accelerating downstream text/image-to-3D generation tasks.
📝 Abstract
Existing sparse-view reconstruction models heavily rely on accurate known camera poses. However, deriving camera extrinsics and intrinsics from sparse-view images presents significant challenges. In this work, we present FreeSplatter, a highly scalable, feed-forward reconstruction framework capable of generating high-quality 3D Gaussians from uncalibrated sparse-view images and recovering their camera parameters in mere seconds. FreeSplatter is built upon a streamlined transformer architecture, comprising sequential self-attention blocks that facilitate information exchange among multi-view image tokens and decode them into pixel-wise 3D Gaussian primitives. The predicted Gaussian primitives are situated in a unified reference frame, allowing for high-fidelity 3D modeling and instant camera parameter estimation using off-the-shelf solvers. To cater to both object-centric and scene-level reconstruction, we train two model variants of FreeSplatter on extensive datasets. In both scenarios, FreeSplatter outperforms state-of-the-art baselines in terms of reconstruction quality and pose estimation accuracy. Furthermore, we showcase FreeSplatter's potential in enhancing the productivity of downstream applications, such as text/image-to-3D content creation.