🤖 AI Summary
This work addresses the inconsistency in optimization objectives and pose instability arising from the fragmented treatment of feature extraction, matching, structure-from-motion (SfM), and novel view synthesis in traditional 3D reconstruction pipelines. To this end, we propose GloSplat, a framework that, for the first time in 3D Gaussian splatting, explicitly incorporates SfM feature tracks as first-class optimizable entities. By jointly leveraging reprojection loss and photometric supervision, GloSplat enables geometrically stable and fine-grained co-optimization of camera poses and appearance. The method includes two variants: GloSplat-F, an efficient COLMAP-free version that achieves state-of-the-art performance among COLMAP-free approaches, and GloSplat-A, a high-accuracy variant that surpasses all COLMAP-based baselines, delivering significant improvements in both reconstruction speed and accuracy.
📝 Abstract
Feature extraction, matching, structure from motion (SfM), and novel view synthesis (NVS) have traditionally been treated as separate problems with independent optimization objectives. We present GloSplat, a framework that performs \emph{joint pose-appearance optimization} during 3D Gaussian Splatting training. Unlike prior joint optimization methods (BARF, NeRF--, 3RGS) that rely purely on photometric gradients for pose refinement, GloSplat preserves \emph{explicit SfM feature tracks} as first-class entities throughout training: track 3D points are maintained as separate optimizable parameters from Gaussian primitives, providing persistent geometric anchors via a reprojection loss that operates alongside photometric supervision. This architectural choice prevents early-stage pose drift while enabling fine-grained refinement -- a capability absent in photometric-only approaches. We introduce two pipeline variants: (1) \textbf{GloSplat-F}, a COLMAP-free variant using retrieval-based pair selection for efficient reconstruction, and (2) \textbf{GloSplat-A}, an exhaustive matching variant for maximum quality. Both employ global SfM initialization followed by joint photometric-geometric optimization during 3DGS training. Experiments demonstrate that GloSplat-F achieves state-of-the-art among COLMAP-free methods while GloSplat-A surpasses all COLMAP-based baselines.