C4D: 4D Made from 3D through Dual Correspondences

📅 2025-10-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Monocular video 4D reconstruction—simultaneously estimating dynamic geometry and camera poses—faces fundamental challenges when moving objects are present, as classical multi-view geometry assumptions break down. This paper proposes the first end-to-end framework for dynamic-scene 4D reconstruction. It establishes dual-level spatiotemporal correspondences by jointly leveraging short-term optical flow and long-term point tracking; introduces a dynamics-aware point tracker and a motion mask estimation module to decouple static background from dynamic objects during optimization; and incorporates 2D–3D trajectory refinement with a multi-task joint loss to jointly optimize depth, camera poses, and point trajectories. Evaluated on multiple benchmarks, our method achieves state-of-the-art performance in depth estimation, camera pose prediction, and long-term point tracking. The reconstructed 4D scenes are geometrically consistent, temporally smooth, and topologically complete.

Technology Category

Application Category

📝 Abstract
Recovering 4D from monocular video, which jointly estimates dynamic geometry and camera poses, is an inevitably challenging problem. While recent pointmap-based 3D reconstruction methods (e.g., DUSt3R) have made great progress in reconstructing static scenes, directly applying them to dynamic scenes leads to inaccurate results. This discrepancy arises because moving objects violate multi-view geometric constraints, disrupting the reconstruction. To address this, we introduce C4D, a framework that leverages temporal Correspondences to extend existing 3D reconstruction formulation to 4D. Specifically, apart from predicting pointmaps, C4D captures two types of correspondences: short-term optical flow and long-term point tracking. We train a dynamic-aware point tracker that provides additional mobility information, facilitating the estimation of motion masks to separate moving elements from the static background, thus offering more reliable guidance for dynamic scenes. Furthermore, we introduce a set of dynamic scene optimization objectives to recover per-frame 3D geometry and camera parameters. Simultaneously, the correspondences lift 2D trajectories into smooth 3D trajectories, enabling fully integrated 4D reconstruction. Experiments show that our framework achieves complete 4D recovery and demonstrates strong performance across multiple downstream tasks, including depth estimation, camera pose estimation, and point tracking. Project Page: https://littlepure2333.github.io/C4D
Problem

Research questions and friction points this paper is trying to address.

Recovering 4D dynamic geometry from monocular video sequences
Addressing inaccurate reconstruction when moving objects violate geometric constraints
Extending 3D reconstruction methods to handle dynamic scenes with motion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages short-term optical flow for dynamic scenes
Uses long-term point tracking for mobility information
Introduces dynamic scene optimization for 4D reconstruction
🔎 Similar Papers
No similar papers found.