LongDPM: Overlap-Aware 4D Reconstruction from Long Monocular Videos

📅 2026-05-17
📈 Citations: 0
Influential: 0
📄 PDF

career value

230K/year
🤖 AI Summary
Existing methods struggle to simultaneously maintain global consistency in dense geometry, camera motion, and temporal correspondences for dynamic 3D scenes from long monocular videos. This work proposes an overlap-aware framework that leverages static-aware overlap region abstraction to enable confidence-weighted coordinate system registration, along with a cross-segment dynamic object identity association and trajectory fusion mechanism. The approach achieves, for the first time, scalable long-sequence monocular 4D dense reconstruction that preserves local accuracy while ensuring global consistency, thereby overcoming the inherent trade-off between sequence duration and reconstruction density. Experiments demonstrate significant reductions in dense tracking endpoint error (EPE) on PointOdyssey and Kubric-F/G, as well as state-of-the-art absolute trajectory error (ATE) performance on TUM-dynamics.
📝 Abstract
Recovering a dynamic 3D scene from a long monocular video is crucial for dense geometry, camera motion, and temporal correspondence to remain consistent in a shared coordinate system. Existing methods face two key challenges: (1) feed-forward reconstruction models provide accurate local predictions but are limited to short clips, and (2) long-range trackers preserve correspondences without producing dense sequence-level reconstruction. This paper presents LongDPM, a novel overlap-aware framework for scalable long-range monocular dynamic reconstruction. First, LongDPM processes long videos in overlapping chunks, keeping inference memory bounded by the chunk length. Second, it connects chunk-local coordinate systems through confidence-weighted registration with static-aware overlap abstraction. Third, it associates dynamic identities across chunk boundaries and fuses matched trajectories to recover coherent long-range 3D motion. Experimental results demonstrate that LongDPM achieves superior long-range reconstruction and tracking performance, reducing dense tracking EPE over V-DPM on PointOdyssey, Kubric-F, and Kubric-G, while obtaining the best TUM-dynamics ATE for camera pose estimation.
Problem

Research questions and friction points this paper is trying to address.

4D reconstruction
monocular video
long-range tracking
dynamic scene
dense geometry
Innovation

Methods, ideas, or system contributions that make the work stand out.

overlap-aware reconstruction
long-range 4D reconstruction
monocular video
dynamic scene modeling
chunk-based registration