🤖 AI Summary
This work addresses the challenging problem of online 3D reconstruction of freely moving objects from monocular video—under conditions of unknown camera poses, no depth priors, arbitrary object motion, and scarce reliable geometric cues—while requiring high-fidelity, object-centric, real-time modeling. We propose an online Gaussian reconstruction framework based on a feedforward network, whose core innovation is a dual-key memory module: it jointly leverages implicit appearance-geometry keys and explicit directional keys to enable robust state aggregation and spatially guided memory retrieval. The method incorporates temporal feature aggregation, spatially guided sparse readout, and an efficient Gaussian sparsification mechanism to maintain a dynamically updated, dense Gaussian primitive field. Evaluated on real-world datasets, our approach significantly outperforms existing pose-free methods; reconstruction quality improves steadily with increasing observation frames, while memory footprint and computational cost remain constant.
📝 Abstract
Free-moving object reconstruction from monocular video remains challenging, particularly without reliable pose or depth cues and under arbitrary object motion. We introduce OnlineSplatter, a novel online feed-forward framework generating high-quality, object-centric 3D Gaussians directly from RGB frames without requiring camera pose, depth priors, or bundle optimization. Our approach anchors reconstruction using the first frame and progressively refines the object representation through a dense Gaussian primitive field, maintaining constant computational cost regardless of video sequence length. Our core contribution is a dual-key memory module combining latent appearance-geometry keys with explicit directional keys, robustly fusing current frame features with temporally aggregated object states. This design enables effective handling of free-moving objects via spatial-guided memory readout and an efficient sparsification mechanism, ensuring comprehensive yet compact object coverage. Evaluations on real-world datasets demonstrate that OnlineSplatter significantly outperforms state-of-the-art pose-free reconstruction baselines, consistently improving with more observations while maintaining constant memory and runtime.