DynOMo: Online Point Tracking by Dynamic Online Monocular Gaussian Reconstruction

๐Ÿ“… 2024-09-03
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 2
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper addresses the challenging problem of online 2D/3D point tracking from monocular videos with unknown camera poses. Methodologically, it introduces the first end-to-end differentiable dynamic online reconstruction framework featuring: (1) a novel monocular online point trajectory generation mechanism; (2) a correspondence-free similarity-enhancing regularizer; (3) an extended 3D Gaussian splatting formulation jointly modeling scene geometry, object motion, and camera pose; and (4) a hybrid integration of optical flow and feature matching with implicit camera motion estimation. Contributions include: (i) establishing the first benchmark for monocular pose-agnostic online point tracking; (ii) achieving robust trajectory tracking and millimeter-level depth consistency on real-world dynamic sequences; and (iii) matching the performance of state-of-the-art offline or multi-view methodsโ€”thereby significantly enhancing practicality for real-time applications such as robotic navigation and mixed reality.

Technology Category

Application Category

๐Ÿ“ Abstract
Reconstructing scenes and tracking motion are two sides of the same coin. Tracking points allow for geometric reconstruction [14], while geometric reconstruction of (dynamic) scenes allows for 3D tracking of points over time [24, 39]. The latter was recently also exploited for 2D point tracking to overcome occlusion ambiguities by lifting tracking directly into 3D [38]. However, above approaches either require offline processing or multi-view camera setups both unrealistic for real-world applications like robot navigation or mixed reality. We target the challenge of online 2D and 3D point tracking from unposed monocular camera input introducing Dynamic Online Monocular Reconstruction (DynOMo). We leverage 3D Gaussian splatting to reconstruct dynamic scenes in an online fashion. Our approach extends 3D Gaussians to capture new content and object motions while estimating camera movements from a single RGB frame. DynOMo stands out by enabling emergence of point trajectories through robust image feature reconstruction and a novel similarity-enhanced regularization term, without requiring any correspondence-level supervision. It sets the first baseline for online point tracking with monocular unposed cameras, achieving performance on par with existing methods. We aim to inspire the community to advance online point tracking and reconstruction, expanding the applicability to diverse real-world scenarios.
Problem

Research questions and friction points this paper is trying to address.

Online 2D and 3D point tracking from monocular camera input.
Dynamic scene reconstruction using 3D Gaussian splatting.
Robust image feature reconstruction without correspondence-level supervision.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Online 2D and 3D point tracking
Dynamic Online Monocular Gaussian Reconstruction
3D Gaussian splatting for scene reconstruction