DINO_4D: Semantic-Aware 4D Reconstruction

📅 2026-04-10
📈 Citations: 0
Influential: 0
📄 PDF

career value

192K/year
🤖 AI Summary
This work addresses the challenge of semantic drift in 4D reconstruction of dynamic scenes, which often hinders the effective integration of geometric and high-level semantic information. To mitigate this issue, the study proposes a novel paradigm that incorporates frozen DINOv2 features as structural priors into the 4D reconstruction pipeline, enabling semantic-aware dynamic tracking and substantially suppressing semantic drift. The resulting framework achieves a balanced representation that preserves both geometric accuracy and semantic consistency. Coupled with an optimization strategy exhibiting linear time complexity, the method significantly improves tracking accuracy (measured by APD) and reconstruction completeness on the Point Odyssey and TUM-Dynamics benchmarks, while maintaining O(T) time efficiency.

Technology Category

Application Category

📝 Abstract
In the intersection of computer vision and robotic perception, 4D reconstruction of dynamic scenes serve as the critical bridge connecting low-level geometric sensing with high-level semantic understanding. We present DINO\_4D, introducing frozen DINOv3 features as structural priors, injecting semantic awareness into the reconstruction process to effectively suppress semantic drift during dynamic tracking. Experiments on the Point Odyssey and TUM-Dynamics benchmarks demonstrate that our method maintains the linear time complexity $O(T)$ of its predecessors while significantly improving Tracking Accuracy (APD) and Reconstruction Completeness. DINO\_4D establishes a new paradigm for constructing 4D World Models that possess both geometric precision and semantic understanding.
Problem

Research questions and friction points this paper is trying to address.

4D reconstruction
semantic drift
dynamic scenes
semantic understanding
geometric precision
Innovation

Methods, ideas, or system contributions that make the work stand out.

4D reconstruction
semantic-aware
DINOv3 features
dynamic scene
world model