DePT3R: Joint Dense Point Tracking and 3D Reconstruction of Dynamic Scenes in a Single Forward Pass

📅 2025-12-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing dense 3D point tracking methods for dynamic scenes rely on pairwise matching, known camera poses, or strict temporal ordering, limiting flexibility and generalization. This paper proposes the first single-pass, forward-only framework that requires no pose priors and supports joint processing of arbitrarily many input images, unifying dense point trajectory estimation and unsupervised 3D reconstruction. Our approach employs a spatio-temporal backbone network to extract holistic deep features across space and time, and directly regresses pixel-level trajectories and geometric maps via multi-task heads—eliminating explicit inter-frame registration and hand-crafted motion modeling. Evaluated on multiple dynamic scene benchmarks, our method achieves state-of-the-art performance while significantly reducing memory footprint and improving inference efficiency. The code and datasets are publicly released.

Technology Category

Application Category

📝 Abstract
Current methods for dense 3D point tracking in dynamic scenes typically rely on pairwise processing, require known camera poses, or assume a temporal ordering to input frames, constraining their flexibility and applicability. Additionally, recent advances have successfully enabled efficient 3D reconstruction from large-scale, unposed image collections, underscoring opportunities for unified approaches to dynamic scene understanding. Motivated by this, we propose DePT3R, a novel framework that simultaneously performs dense point tracking and 3D reconstruction of dynamic scenes from multiple images in a single forward pass. This multi-task learning is achieved by extracting deep spatio-temporal features with a powerful backbone and regressing pixel-wise maps with dense prediction heads. Crucially, DePT3R operates without requiring camera poses, substantially enhancing its adaptability and efficiency-especially important in dynamic environments with rapid changes. We validate DePT3R on several challenging benchmarks involving dynamic scenes, demonstrating strong performance and significant improvements in memory efficiency over existing state-of-the-art methods. Data and codes are available via the open repository: https://github.com/StructuresComp/DePT3R
Problem

Research questions and friction points this paper is trying to address.

Simultaneously tracks dense points and reconstructs 3D dynamic scenes
Operates without requiring known camera poses or temporal ordering
Enhances adaptability and efficiency in rapidly changing environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Single forward pass for dense tracking and 3D reconstruction
Operates without requiring known camera poses
Uses deep spatio-temporal features with dense prediction heads
🔎 Similar Papers
No similar papers found.
V
Vivek Alumootil
University of California, Los Angeles
T
Tuan-Anh Vu
University of California, Los Angeles
M. Khalid Jawed
M. Khalid Jawed
UCLA (Structures-Computer Interaction Lab)
Solid and structural mechanicsroboticsphysics-assisted machine learning