🤖 AI Summary
Existing dense 3D point tracking methods for dynamic scenes rely on pairwise matching, known camera poses, or strict temporal ordering, limiting flexibility and generalization. This paper proposes the first single-pass, forward-only framework that requires no pose priors and supports joint processing of arbitrarily many input images, unifying dense point trajectory estimation and unsupervised 3D reconstruction. Our approach employs a spatio-temporal backbone network to extract holistic deep features across space and time, and directly regresses pixel-level trajectories and geometric maps via multi-task heads—eliminating explicit inter-frame registration and hand-crafted motion modeling. Evaluated on multiple dynamic scene benchmarks, our method achieves state-of-the-art performance while significantly reducing memory footprint and improving inference efficiency. The code and datasets are publicly released.
📝 Abstract
Current methods for dense 3D point tracking in dynamic scenes typically rely on pairwise processing, require known camera poses, or assume a temporal ordering to input frames, constraining their flexibility and applicability. Additionally, recent advances have successfully enabled efficient 3D reconstruction from large-scale, unposed image collections, underscoring opportunities for unified approaches to dynamic scene understanding. Motivated by this, we propose DePT3R, a novel framework that simultaneously performs dense point tracking and 3D reconstruction of dynamic scenes from multiple images in a single forward pass. This multi-task learning is achieved by extracting deep spatio-temporal features with a powerful backbone and regressing pixel-wise maps with dense prediction heads. Crucially, DePT3R operates without requiring camera poses, substantially enhancing its adaptability and efficiency-especially important in dynamic environments with rapid changes. We validate DePT3R on several challenging benchmarks involving dynamic scenes, demonstrating strong performance and significant improvements in memory efficiency over existing state-of-the-art methods. Data and codes are available via the open repository: https://github.com/StructuresComp/DePT3R